WORLD INTELLECTUAL PROPERTY ORGANIZATION 
International Bureau 




PCX 

INTERNATIONAL APPUCATION PUBUSHED UNDER THE PATENT COOPERATION TREATY (PCT) 



(Si) International Patent Classiflcation ^ : 

C12N 15/52, V2h 9/DO, C12P 21/00 // 
(C12N 1/21, C12R 1:03) 



Al 



(U) International Publication Number: WO 98/11230 

(43) Intamational Publication Date: 19 March 1998 (19.03.9S) 



(21) International Application Number: PCT/liS96/ 1 479 1 

(22) International Filing Date: 1 3 September 1 996 ( 1 3.09.96) 



(71) AppUcant: BRISTOL-MYERS SQUIBB COMPANY 

(US/US); Route 206 • Province Line Road» P.O. Box 4000. 
Princeton. NJ 08S43-4000 (US). 

(72) Inventors: OKI. Toshikazu; 4-20-10. Shodo. Sakea*ku. Yoko- 

hama 247 (IP). DAIRl, Tohnt; 5-139, Tsukiolca-cho. 
Toyama. Toyama 939 (JP). 

(74) Agent: BL(X>M. Allen; Dechert Price & Rhoads. Princeton 
Pike Corporate Center, P.O. Box 5218, Princeton, NJ 08543- 
5218 (US). 



(81) Designated States: CA, JP, European patent (AT. BE, CH. DE, 
DK, ES, Fl, PR, GB, OR, IE IT. LU. MC, NU PT. SE). 



Published 

With imematlonal search report. 



(54) Title: POLYKETTDE SYNTHASES FOR PRADIMICIN BIOSYNTHESIS AND DNA SEQUENCES ENCODING SAME 



2 2 £ =^ - 

^ yi II 11 III ll^ 




ORFt ORF2 OmORF40RH 0PF6 OfU70RF80RF9 ORPtO ORF11 



(57) Abstract 



The present invention provides, inter alia, nucletc acids and corre sp onding amino acid sequences of several Actinomadwra polyketide 
synthase genes that are useful, for exemple, in preparing piadimicin and analogs thereof. 



• # 



FOR THE PURPOSES OF INFORMATION ONLY 
Codes used to identify States paity to the PCT on the from pages of pamphleis publishing international applications under the PCT. 



AL 


AlbanU 


BS 


Spain 


LS 


Lesotho 


SI 


Sloveaia 


AM 


Aimenia 


Fi 




LT 


Lithuania 


SK 


Slovakia 


AT 


Austru 


FR 




LU 


Loxembouff 


SN 


Senegal 


AU 


Amtnlia 


GA 


G^bon 


LV 


Latvia 


sz 


Swaziland 


AZ 


AlClblijBI 


GB 


United Ktnfdom 


MC 


MoiiaoD 


TD 


Chad 


BA 


Boaiih and HCRcaovini 


GB 


Ocofgia 


MO 


Republic of Moklova 


TC 


Togo 


BB 


BafbadM 


GH 


Ghana 


MG 


Madmwai 


TJ 


Tajacinan 


BB 


BelgiiiD 


GN 


Oninea 


MK 


The fimn^ Yugotlav 


TM 




BF 


Burkina Fno 


GR 


Greece 




Republic of Macedonia 


TR 




BG 


Bu1|artt 


HU 


Huagaiy 


ML 


MaU 


TT 


Trinidad and Tobaso 


Bi 


Bdiis 


IE 


IielMd 


MN 


Mongolia 


UA 


Ufaaiae 


BR 


Brazi] 


VL 


brae) 


MR 


Mauritania 


UG 


Uganda 


BY 


Belanit 


IS 


Ice bold 


MW 


Malawi 


US 


United Slates oT Anerica 


CA 


Canada 


rr 


Italy 


MX 


Mcaioo 


UZ 


UzbeUstan 


CF 


Centril African Republic 


ip 


Japan 


NE 


Nffer 


VN 


Viet Nan 


CG 


Confo 


KB 


Kenya 


NL 


Nethcf lands 


YU 


YQ^slavta 


CH 


Swncrtand 


KG 


Kyigyzttan 


NO 


Norway 


ZW- 


Zimbabwe 


CI 


Cfted*I«oire 


KF 


Dunociaiic Bcople't 


NZ 


New Zealand 






CM 


CunciooB 




Republic of Kflnea 


PL 


fttland 






CN 


Chioa 


KR 


Reimblic of Korea 


FT 


l^ntngal 






cu 


Cbba 


KZ 


Kazakitas 


RO 


Romaaia 






cz 


Cscch Rcputibc 


LC 


Saini l^ocsa 


RU 


Rouiao Fcdemion 






DB 


. OemtaBy 


U 


LiechtcniteiB 


SD 


Sudan 






DK 


Dcnnurii 


LX 


SriLnka 


SE 


Sweden 






BB 


F4»nia 


LR 


Liberia 


SG 


Snjapoie 







wo 98/11230 




PCTAJS96/14791 
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POLYKETIDE SYNTHASES FOR PRADIMICIN BIOSYNTHESIS 
AND DNA SEQUENCES ENCODING SAME 

The present invention relates, inter alia, to purified nucleic acids 
5 encoding polyketide synthase genes for pradimicin biosynthesis, and 
purified polypeptides having polyketide synthase activity. Polyketide 
metabolites are natural products made by microorganisms and plants 
from simple fatty acids. Many poiyketides are used as human and 
animal pharmaceuticals such as antibiotics, chemotherapeutics and 

10 growth promoting agents, as well as flavoring agents and pigments. 

Biosynthesis of poiyketides is believed to occur by a series of 
condensations of carbon units in a manner similar to that of long chain 
• fatty acids which are formed by fatty acid synthase. The fatty acids are 
formed by a process in which a chain starter, usually a 2-carbon acetate 

15 residue, which is Joined by condensation to a chain extender unit, such 
as malonate, to form an even*numbered chain. The resulting i9-keto 
group is then processed, by j9-ketoacyi reduction, dehydration and enoyi 
reduction. The cycle then begins again with the condensation of a new 
extender unit. A typical fatty acid synthase is a multivalent system 

20 involving eight functional units, acetyl, malonyl and palmityl 

transferases, acyl carrier protein, ketoacyl synthase, ketoacyl reductase, 
dehydratase and enoyI reductase. The organization of these units varies 
in different organisms. See, for example, EMBO J. 8:2717-2725 
(1989). 

25 The fatty acid synthesis process differs from polyketide synthesis 

since most poiyketides contain structural complexities due to the use of 
different starter and extender units, such as acetate, propionate and 
butyrate. The polyketide synthesis is further complicated by variations 
in the extent of processing of the ^-carbon O?*ketoreduction, 

30 dehydration, enoylreduction) as well as the introduction of chiral 
carbons. See, for example. Science 252:675-679 (1991). 

The tetracenomycin C polyketide synthase genes {tmcn from 
Streptomyces g/aucescens, for example, have been sequenced, and the 
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sequence data revealed three complete open reading frames. An 
analysis of the sequence data resulted in a conclusion that polykettde 
synthesis in S.g/aucescens involves a multtenzyme complex consisting 
of at least five types of enzymes. These enzymes, which are 
5 homologous to counterparts involved in fatty acid synthesis, are 
presumably involved in the assembly of the tetracenomycin C 
decaketide. 

Additionally, for example, the structure and function of the 
granaticin*producing polyketide synthase gene cluster of Streptomyces 

10 vio/aceoruber has also been studied. This gene cluster has six open 
reading frames, thereby indicating that the granaticin-producing 
polyketide synthesis likely consists of at least six separate enzymes 
involved in carbon chain assembly. See EMBO J. 8:2717-2725 (1989). 
Further, Streptomyces polyketide synthase gene clusters involved in the 

15 biosynthesis of actinorhodin and the whiE spore pigment have also been 
described. See J. Biof. Cham. 267:19278*19290 (1992) and Gene 
130:107-116 (1993). 

The molecular organization of the polyketide biosynthesis genes 
of Saccharopoiyspora arythraaa, which govern synthesis of the 

20 polyketide portion of the macrolide antibiotic erythromycin, is similarly 
complex. The genes are organized in six repeated units that encode 
fatty acid synthase-like activities. Two repeated units are contained in a 
single open reading frame. It is believed that each repeated unit 
encodes a functional synthase unit and each synthase unit participates 

25 in one of six fatty acid synthase-like elongation steps required for the 
fonnation of the polyketide. See EMBO J. 8:2727*2736 (1989). 

Based on the above data, a model has been proposed in which 
polyketide genes have repeated units designated modules, and the 
corresponding proteins are called synthase units, wherein each synthase 

30 unit is responsible for one of the fatty acid synthase-like cycles required 
for completing the polyketide. Thus, each synthase unit carries the 
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elements required for the condensation process, for selecting the 
particular extender unit to be incorporated, and for the extent of 
processing that the /7-carbon will undergo. After completion of the 
cycle, the nascent polyketide is transferred from the acyl carrier protein 
5 (ACP) it occupies to the i9*lcetoacyl ACP synthase of the next synthase 
unit utilized, where the appropriate extender unit and processing level 
are introduced. This process is repeated, using a new synthase unit for 
each elongation cycle, until the programmed length has been reached. 
According to this model, formation of complex polyketides requires the 

10 participation of a different synthase unit for each cycle, thereby ensuring 
that the correct molecular structure is produced. See, for example, 
Annu. Rbv. Microbiol. 47:875-912 (1993). 

An actinomycete, namely, Actinomadura, certain strains of which 
were previously isolated from soil samples collected in the Fiji Islands 

15 and in India, was found to produce a complex of antibiotics designated 
pradimicin. See, for example, J. Antibiot. 43:755-762 (1990). 
Pradimicin A, as shown in Figure 1 , has a unique dihydro- 
benzoLslnaphthacenequinone aglycon substituted with D-alanine and 
two sugars, and is a potent antifungal antibiotic produced, for example, 

20 by Actinomadura hibisca and Actinomadura varrucosospora subsp. 
naoMbisca. See, for example, J. Antibiot 43:755-762 (1990) and J. 
Antibiot 46:387-397 (1993). Pradinrucin is an antibiotic useful for 
multiple purposes, particularly for use as a pharmaceutical. For 
example, pradimicin has been shown to have activity against system 

25 fungal infections caused by Candida albicans, Aspargillus fumigatus and 
Cryptococcus naoformans. Further, pradimicin is active in vitro against 
a wide variety of fungi and yeasts, some Gram-positive bacteria, and 
viruses. J. Org. Cham. 54:2536-2539 (1989). Purified polypeptides 
having polyketide synthase activity and purified nucleic acids encoding 

30 such polypeptides are therefore desirable, for example, to provide 
pharmaceutically useful products. 



wo 98/11230 




PCr/US96/14791 



4 



SUMMARY OF THg IMVPMT|ni^ 

Until now, the sequences encoding polyketide synthase genes in 
Act/nomadura had not been identified. These sequences are provided in 
the present invention. 
5 One preferred embodiment of the present invention is a 

substantially pure nucleic acid comprising a nucleic acid sharing at least 
about 75% nucleic acid identity with an open reading frame (ORF) of an 
Actinomadura polyketide synthase gene, and more preferably, at least 
about 80% identity, and most preferably, at least about 90% identity. 

10 In certain preferred embodiments, the nucleic acid comprises a nucleic 
acid selected from the group consisting of SEQ ID NO: 1-1 2. A further 
preferred embodiment is a substantially pure nucleic acid comprising a 
nucleic acid encoding an Actinomadura polyketide synthase gene 
sharing at least about 75% amino acid identity, and more preferably, at 

15 least about 80% identity, and most preferably, at least about 90% 

identity with a polypeptide encoded by a nucleic acid selected from the 
group consisting of SEQ ID N0:M2. 

In certain preferred embodiments, the substantially pure nucleic 
acid comprises a nucleic acid encoding a polypeptide differing from an 

20 Actinomadura polyketide synthase gene by no more than about 20 
amino acid substitutions, and more preferably, no more than about 1 0 
amino acid substitutions. Preferably, the substitutions cause a 
conservative substitution in the amino acid sequence of the encoded 
polyketide synthase. The nucleic acids of the invention also Include 

25 nucleic acid analogs. 

Further, the present invention provides a substantially pure nucleic 
acid comprising a nucleic acid encoding a polypeptide sharing at least 
about 75% amino acid identity with a polyketide synthase for 
biosynthesis of a benzo(a)naphthacenequinone. Preferably, the nucleic 

30 acid encodes a polypeptide sharing at least about 80%, and more 
preferably, at least about 90% amino acid identity with a polyketide 
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synthase for biosynthesis of a benzo(£)naphthacenequinone. In 
preferred enDbodiments, the 

polyketide synthase is an Actinomadura poiyketide synthase, and the 
polyketide is preferably a dihydrobenzo(a)naphthacenequinoneaglyconr 
5 and preferably pradimlcin, such as Pradimicin A, B, C, D, E, FA*2, 
FL. FS, 11-0-L-xylosylpradimtcin H, U T1, T2 or BMS181 184. 

Yet another embodiment of the invention is a substantially pure 
nucleic acid comprising a nucleic acid that hybridizes, under stringent 
conditions, to a nucleic acid comprising a nucleic acid encoding a 

10 polypeptide sharing at least about 75% amino acid identity with an 
actinomadura polyketide synthase. More preferably, the nucleic acid 
hybridizes to a nucleic acid comprising a nucleic acid encoding a 
polypeptide sharing at least about 80% amino acid identity with an 
Actinomadura polyketide synthase, and even nrare preferably, encoding 

15 a polypeptide sharing at least about 90% amino acid identity with an 
Actinomadura polyketide synthase. Most preferably, the nucleic acid 
hybridizes with a nucleic acid comprising a nucleic acid selected from 
the group consisting of SEQ ID NO: 1*1 2. Such a hybridizing nucleic 
acid can be used, for example, to screen for organisms that produce 

20 pradimicin. 

The invention additionally includes vectors capable of reproducing 
in a eukaryotic or prokaryotic cell having a nucleic acid described above 
as well as transformed eukaryotic or prokaryotic cells having such 
nucleic acid. 

25 Thus, another preferred embodiment is a transformed eukaryotic 

or prokaryotic cell comprising a nucleic acid encoding a polypeptide 
sharing at least about 70% amino acid identity with an Actinomadura 
polyketide synthase gene, and more preferably, at least about 80% 
identity, and nru>8t preferably, at least about 90% identity. Most 

30 preferably, the nucleic acid sequence comprises a nucleic acid selected 
from the group consisting of SEQ ID NO: 1-1 2. Preferably, the 
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10 



15 



20 



25 



transformed cell expresses one of the Actinomadura polyketide synthase 
genes described herein. 

Yet another preferred embodiment is a vector capable of 
reproducing in a eukaryotic or prokaryotic ceil comprising a nucleic acid 
encoding a polypeptide sharing at least about 70% nucleic acid identity 
with an Actinomadura polyketide synthase gene, and more preferably, at 
least about 80% identity, and most preferably, at least about 90% 
identity. Preferably, the nucleic acid comprises a nucleic acid selected 
from the group consisting of SEQ ID NO: M 2. Preferably, the inventive 
vector expresses, intraceliuiarly or extracellularly, one of the 
Actinomadura polyketide synthases described herein. 

Another embodiment of the present invention provides a 
substantially pure polypeptide comprising an amino acid sequence 
sharing at least about 75% amino acid identity with an Actinomadura 
polyketide synthase, and more preferably, at least about 80% identity, 
and most preferably, at least about 90% kientity. Preferably, the 
polypeptide shares at least about 75% amino acid identity with a 
polypeptide comprising an amino acid sequence selected from the group 
consisting of SEQ ID NO:13-15. 

Yet another preferred embodiment is a method of preparing 
pradimicin or a pradimicin analog thereof, comprising transforming a 
eukaryotic or prokaryotic cell with an expression vector for expressing 
intraceliuiarly or extracellulariy a nucleic acid comprising a nucleic acid 
encoding a polypeptide sharing at least about 70% amino acid identity 
with an Actinomadura polyketide synthase, growing the transformed celt 
in culture, and isolating the pradimicin or analog thereof from the 
transformed cell or the culture medium. Preferably, the polypeptide 
shares at least about 80% amino acid identity with an Actinomadura 
polyketide synthase, and more preferably, the polypeptide shares at 
least about 90% amino acid identity with an Actinomadura polyketide 
synthase. Most prefereably, the expression vector comprises a nucleic 
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acid encoding all polyketide synthase genes necessary for synthesis of 
pradimicin, such as SEQ ID N0:1. 

BRIEF DESCRIPTION OF THE DRAWIMGS 
5 Figure 1 shows the chemical structure of two types of pradimicin, 

pradimicin A and pradimicin S. 

Hgura 2 shows conserved amino acid sequences in fi- 
ketosynthases and acyl transferases for granaticin, tetracenomycin and 
10 actinorhodin. These conserved sequences were used to create two 
probes for cloning the polyketide synthase genes in Actinomadura. 

Rgure 3 shows a restriction map of Actinomadura polyketide 
synthase genesi ORFs 1-11. 

15 

Rgure 4 provides an alignment of the Actinomadura 0RF1 gene 
product (**A") (SEQ ID NO: 13) with a Straptomyces polyketide synthase 
gene product for tetracenomycin biosynthesis ("B"). 

20 Hgure 5 provides an alignment of the Actinomadura 0RF2 gene 

product ("A") (SEQ ID NO: 14) with a Streptomycas polyketide synthase 
gene product for actinorhodin biosynthesis ("B"). 

DETAILED DESCRIPTIOM 

26 

The present invention provides, inter alia, nucleic acids and 
corresponding amino acid sequences of Actinomadura polyketide 
synthase genes. The polyketide synthases are responsible for the 
biosynthesis of pradimicin, such as zwitterionic pradimicins A, B and C, 
30 which are produced, for example, by Actinomadura hibisca, and 

pradimicin S, which is produced, for example, b^f Actinomadura spinosa. 
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See Rgure 1 , which provides the chemical structures of pradimicins A 
and S. See also J. Antibiot. 43:755-762 (1990). Pradimicin is useful, 
for example, as an antibiotic, including use as an anti-fungal and an anti- 
viral agent. For example, pradimicin has been shown to have activity 
5 against system fungal infections caused by Candida albicans, Aspargillus 
fumigaius and Cryptococcus neoformans. Further, pradimicin is active 
in vitro against a wide variety of fungi and yeasts, some Gram-positive 
bacteria, and viruses. J. Org. Cham. 54:2536-2539 (1988). For 
instance, pradimicin is believed to be active against HIV. See, for 

10 example, J. Antibiot. 41 :1 708 (1 988) and Viroiogy 1 76:467 (1 990). 

Techniques used in the prior art were not applicable for cloning 
pradimicin A biosynthetic genes from Actinomadura hibisca. 
Specifically, many antibiotic biosynthetic genes including self-defense 
genes in actinomycetes are clustered in a genomic region. The close 

16 linkage between antibiotic biosynthetic genes and self-defense genes 
has provided a useful tool for cloning of antibiotic biosynthetic genes, 
since transformants carrying antibiotic resistance determinants can be 
selected. However, this technique could not be applied to the cloning of 
the pradimicin A biosynthetic gene cluster because pradimicin A had not 

20 been shown to have significant antibacterial activity. Therefore, the 
polyketide synthase genes for pradimicin A biosynthesis were cloned 
from Actinomadura hibisca using oligonucleotide probes based on the 
conserved amino acid sequences of other polyketide synthase genes, 
followed by cloning of the flanking region of pradimicin A polyketkie 

25 synthase genes. Specifically, certain amino acid sequences of &-keto 
synthase, acyl transferase and acyl carrier protein of polyketide 
synthases are strongly conserved in Streptomyces strains producing 
polyketide antibiotics. SaaAnnu. Bav. Microbioi. 47:875-912 (1993) 
and J. Biol. Cham. 267:19278-19290 (1992). Based on these 

30 sequences, two oligonucleotide probes were synthesized, as shown in 
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Figure 2. See also Example 1, which provides experimental details of 
the cloning of the pradimicin A polyketide synthase genes. 

After screening with an Actinomadura hibisca library, an 8.2 kb 
Sac I fragment was identified, which hybridized with these 
5 oligonucleotide probes. By DNA sequencing of the 8.2 kb Sac I 
fragment (SEQ ID N0:1), eleven open reading frames (ORFs) were 
identified. All of ORFs except for 0RF10 are believed to be translated in 
the same direction. Referring to SEQ ID N0:1, 0RF1 spans from 
position 72 (beginning with GTG) to position 1347 (ending with TGA); 

10 0RF2 spans from 1346 (GTG) to 2567 (TGA); 0RF3 spans from 2594 
(ATG) to 2855 (TGA); 0RF4 spans from 2854 (ATG) to 3313 (TGA); 
0RF5 spans from 3312 (GTG) to 3771 (TGA); 0RF6 spans from 3794 
(ATG) to 4817 (TGA); 0RF7 spans from 4857 (ATG) to 5595 (TGA); 
ORFS spans from 5594 (GTG) to 5933 (TGA); 0RF9 spans from 5932 

15 (GTG) to 6241 (TAA); ORF10 spans, in reverse direction, from 7534 
(ATG) to 6301 (TAG) and 0RF11 spans from 7668 (ATG) to 8010 
(TGA). 

Each of the deduced ORFs has a significant similarity to a protein 
responsible for polyketide biosynthesis or spore color formation in other 

20 organisms. 0RF1, 0RF2 and 0RF3 have particularly strong similarities 
(50% 70% amino acid identity) with polyketide synthases for 
actinorhodin biosynthesis. See, for example. Figure 4, which provides 
an alignment of the 0RF1 gene product with a Streptomyces polyketide 
synthase gene product for tetracenomycin biosynthesis, and Figure 5, 

25 which provides an alignment of the 0RF2 gene product with a 
Streptomyces polyketide synthase gene product for actinorhodin 
biosynthesis. See also Table 1 below. 

Table 1 



30 



Number of Molecular Trertslatior^ 
ORFs emino actde weight coupling 



Homologous proteins 
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ORFl 426 44^440 Unknown Hypothetical protein 4 of Sac. Nrsut9 (73% identity 

Bnr>ong 413 amino acids)^^ 
tcm ia gene of S. gkucescens C73%/412p 
9f9 I oene of S. vtoiacBruter (71 %/41 3)* 
act I ORFl of S. GoeHcohr (69%/415)^ 

0RF2 408 41,610 0RF1/0RF2 1 0RF2 of S. co«*co/tor (57%/397)^ 

tcm Id gene of S. gimjcescens (54%/403)^ 
Bata-ketoacyl synthase chain 2 of S. cinnamonensis 
C50%/397l» 

8® Hypothetical protein 6 of Sac. Nnuta (51 %/78)" 

Granaticin-producing PKS acyl carrier protein of 
S. vioiaceruber (53%/75)* 
Actinorhodin-productng PKS acyl carrier protein of 
S.coa^coMr (51%/75)^ 

0RF4 154 17,694 0RF3/0RF4 Hypothetical protein 7 of S. cotfi7coA>r (58 %/1 49)" 

PKS cydase ci//f of S. cyaneua (61 %/142)'> 
tcrriH protein of S. gfaucascens (52%/149)'^ 

0RF5 154 15,784 0RF4/0RF5 Hypothetical protein 6 of Mixococcus xanthus 

(46%^9r 

Histidine protein kinase dSfvJ of Cauhtactar 
crascantus (26%/102r^ 

Mutticatatytic andopaptidase complex chain Y7 of 
Sac. caravisiaa (23%/105)*^' 

ORF6 342 37,004 rcmN protein of S. gfaueascans (47%/330)* 

Carminomycsn 4-0-methyltransfera8e of S. paucatius 
(30%/317)'» 

0-demethyipuromycin O-methyltransf erase of 
S. amOatus (33%/334)^>* 

0RF7 247 25,583 3-ketoacyl-ACP reductase fa6 G of E. co// 

(38%/244r^ 

Granaticin-producing PKS chain 5 of S. violacarybar 
(30%/251)» 

Granaticin-producing PKS chain 6 of S. vtotaearuber 
(35%/252)» 



114 12,986 ORF7/ORF8 Hypothetical protein 1 of S. coaGcofor (24%/80)» 
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0RF9 104 11.279 0RF8/0RF9 



ORF10 412 44.857 



0RF11 lis 13,036 



Hypothetical protein 1 of S. eoaScolor {24%/91)'' 
Hypottteticat protein 6 of Sae. fursuta (27%/4e)" 
Hypothetical 41 .2 KD protein of 5. halattiiri 
I24%/91>'» 

Cytochrome P450 10SB1 of S. griseolus (40%/404) 
Cytochrome P4S0 P4B0CVIIB1 of Sae. arythnaa 
(38%/40S)*'> 

Cytochrome P450 105C1 of StnpUunycaa sp. 
(41%/323l'*' 

Hypothetical protein 7 of S. eoaStohr <51 % 107)** 
auG protein of S. cyorwut (46%/106)'' 
fcml protein of S. glaueaseans |3S%/10S)** 



5 

^>yVfo/. Gen. Genet 240:146-150 (1993). 
" BMBO J. 8:2727-2736 (1989). 
^^EMBOJ. 8:2717-2725 (1989). 
*^J. Biol. Chem. 267:19278-19290 (1992). 
10 " Mot. Gen. Genet. 234:254-264 (1992). 
" Mot. Microbiol. 4:1679-1691 (1990). 
'» Gene 117:131-136 (1992). 
"J. Bacteriot. 174:1810-1820 (1992). 
EMBL data library no. S321 73. 
15 Proc. Natl. Acad. Set. 89: 1 0297-1 0301 (1 992) . 
^"Afo/. Celt. Biol. 11:344-353 (1991). 

J. Bacteriot. 1 75:3900-3904 (1 993). 
^^Gene 109:55-61 (1991). 
'*» J. Biol. Cfiem. 267:5751-5754 (1992). 
20 Gene 1 30:107-1 1 6 (1 993). 

J. Bacteriot. 173:3335-3345 (1990). 
J. Becteriol. 174:725-735 (1992). 
J. Bacteriot. 172:3644-3653 (1990). 
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EMBL data library no. S27691. 

DNA regions homologous to the Actinomadura polyketide 
synthase genes were specifically found in all of pradimicin producers 
5 examined, but not in pradimicin non-producers in genomic Southern 
hybridization, thereby providing evidence that the genes cloned encode 
polyketide synthases for pradimicin biosynthesis. 

Thus, the present invention provides, inter aha, nucleic acids 
encoding Actinomadura polyketide synthase genes and polypeptides and 
10 analogs thereof, including nucleic acids that bind to an Actinomadura 
polyketide synthase gene. The nucleic acids can be used, for example, 
to screen for organisms that produce pradimicin or that have 
homologous polyketide synthase gene sequences. Further, the nucleic 
acids can be used, for instance, to synthesize polyketide synthases, 
15 which can in turn be used, for example, to produce pradimicin. 

The Actinomadura species include but are not limited to 
Actinomadura hibisca, Actinomadura verrucosospora, and particularly 
subsp. naohibisca, Actinomadura libanotica. Actinomadura achinospora, 
Actinomadura chengduansis, Actinomadura Ici/aniata, Actinomadura 
20 atramantaria, Actinomadura citraa, Actinomadura cramea, Actinomadura 
^fulvascens, Actinomadura viridis, Actinomadura rosaoviolacaa, 
Actinomadura vanrucosopora, Actinomadura maduraa, Actinomadura 
pa/iatiari and, for example, other soil isolates. 

25 1. Nucleic Acids 

The present invention provides, inter alia, nucleic acids. The 
nucleic acid embodiments of the invention are preferably 
deoxyribonucleic acids (DNAs), both single- and double-stranded, and 
most preferably double-stranded deoxyribonucleic acids. However, they 

30 can also be ribonucleic acids (RNAs), as well as hybrid RNA:DNA 
double-stranded molecules. 
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Nucleic acids encoding an Actinomadura polyketide synthase gene 
include all Actinomadura polyketide synthase gene-encoding nucleic 
acids, whether native or synthetic, RNA, DNA, or cONA, that encode an 
Actinomadura polyketide synthase gene, or the complementary strand 
5 thereof, including but not limited to nucleic acid found in an 

Actinomadura polyketide synthase gene-expressing organism. For 
recombinant expression purposes, codon usage preferences for the 
organism in which such a nucleic acid is to be expressed are 
advantageously considered in designing a synthetic polyketide synthase- 

10 encoding nucleic acid. 

Further, the present invention provides a substantially pure nucleic 
acid comprising a nucleic acid encoding a polypeptide sharing at least 
about 75% amino acid identity with a polyketide synthase for 
biosynthesis of a benzo(fi)naphthacenequinone. Preferably, the nucleic 

15 acid encodes a polypeptide sharing at least about 80%, and more 
preferably, at least about 90% amino acid identity with a polyketide 
synthase for biosynthesis of a benzo(fi)naphthacenequinone. In 
preferred embodiments, the 

polyketide synthase is an Actinomadura polyketide synthase, and the 
20 polyketide is preferably a dihydrobenzo(£)naphthacenequinone aglycon, 

and preferably pradimicin, such as Pradimicin A, B, C, D. E, FA-1, FA-2, 

FL, FS, H, 1 1-0-L-xylosylpradimicin H, L, S, Tl, T2 or BMS181184. 

For a description of the foregoing pradimicins, see, for example, J. 

Antibiot. 41:1701 (1988), J. Org. Cham. 54:2536 (1989), J. Antibiot. 
25 43:771 (1990), J. Antibiot. 43:1223 (1990), J. Antibiot 46:265 

(1993), J. Antibiot. 46:398 (1993), J. Antibiot. 46:406 (1993)/ J. 

Antibiot. 46:598 (1993), and J. Antibiot. 46:1589 (1993). 

In addition to nucleic acids encoding an Actinomadura polyketide 

synthase gene, the present invention includes nucleic acids encoding 
30 polypeptides that are homologous to or share a percentage amino acid 

identity with Actinomadura polyketide synthases. 
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Numerous methods for determinirig percent homology are known 
in the art. One preferred method is to use version 6.0 of the GAP 
computer program for making sequence comparisons. The program is 
available from the University of Wisconsin Genetics Computer Group 
5 and utilizes the alignment method of Needleman and Wunsch, J. Mol. 
Biol. 48, 443, 1970, as revised by Smith and Waterman Adv. Appl. 
Math. 2, 482, 1981. 

Numerous methods for determining percent identity are also 
known in the art, such as use of the FASTA computer program, which 
10 is also available from the University of Wisconsin. Preferably, the 
program used to determine percent identity is the DNASIS program, 
which is available from Hitachi Corp. (Tokyo, Japan). 

To construct non-naturally occurring Actinomadura polyketide 
synthase gene-encoding nucleic acids, the native sequences can be used 
15 as a starting point and modified to suit particular needs. The nucleic 
acids of the invention include, for example, the nucleic acids of SEQ ID 
N0:1-12. 

The invention is also directed to a nucleic acid encoding a 
segment of an Actinomadura polyketide synthase gene. Preferably, the 
20 encoded polypeptide will be effective to perform its function, such as 
an enzymatic function, that is performed by the full-size polyketide 
synthase. 

For identifying the active domain or domains of Actinomadura 
polyketide synthase genes, one approach is to take an Actinomadura 

25 polyketide synthase gene cDNA and create deletional mutants lacking 
segments at either the 5' or the 3' end by, for instance, partial digestion 
with SI nuclease, Bal 31 or Mung Bean nuclease (the latter approach 
described in literature available from Stratagene, San Diego, CA, in 
connection with a commercial deletion cloning kit). Alternatively, the 

30 deletion mutants are constructed by subcloning restriction fragments of 
an Actinomadura polyketide synthase gene cDNA. The deletional 
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constructs are cloned into expression vectors and tested for their 
polyketide synthase activity. 

These structural genes can be altered by mutagenesis methods 
such as that described by Adelman et al., DAM, 2: 183 (1983) or 
5 through the use of synthetic nucleic acid strands. The products of 
mutant genes can be tested for polyketide synthase activity. 

The nucleic acid sequences can be further mutated, for example, 
to incorporate useful restriction sites. See Maniatis et aL Molecular 
Cloning, a Laboratory Manual (Cold Spring Harbor Press, 1 989). Such 
10 restriction sites can be used to create "cassettes," or regions of nucleic 
acid sequence that are facileiy substituted using restriction enzymes and 
ligation reactions. The cassettes can be used to substitute synthetic 
sequences encoding mutated Actinomadura polyketide synthase amino 
acid sequences. 

15 Actinomadura polyketide synthase gene-encoding sequences can 

be, for instance, substantially or fully synthetic. See, for example, 
Goeddel et al., Proc. Natl. Acad. Sci. USA, 76, 106-1 10 (1979). For 
recombinant expression purposes, codon usage preferences for the 
organism in which such a nucleic acid is to be expressed are 

20 advantageously considered in designing a synthetic Actinomadura 

polyketide synthase gene-encoding nucleic acid. Since the nucleic acid 
code IS degenerate, numerous nucleic acid sequences can be used to 
create the sanne amino acid sequence. 

Further, with an altered amino acid sequence, numerous methods 

25 are known to delete sequences from or mutate nucleic acid sequences 
that encode a polypeptide and to confirm the function of the 
polypeptides encoded by these deleted or mutated sequences. 
Accordingly, the invention also relates to a mutated or deleted version 
of an Actinomadura polyketide synthase nucleic acid that encodes a 

30 polypeptide that preferably retains polyketide synthase activity. 
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Conservative mutations are preferred. Such conservative 
mutations include mutations that switch one amino acid for another 
within one of the following groups: 

1 . Small aliphatic, nonpolar or slightly polar residues: Ala, Ser, Thr, 
5 Pro and Gly; 

2. Polar, negatively charged residues and their amides: Asp, Asn, 
Giu and Gin; 

3. Polar, positively charged residues: His, Arg and Lys; 

4. Large aliphatic, nonpolar residues: Met, Leu, He, Val and Cys; 
10 and 

5. Aromatic residues: Phe, Tyr and Trp. 

A preferred listing of conservative substitutions is the following: 



Original Residue 


Substitution 


Ala 


Gly, Ser 


Arg 


Lys 


L„ 


Gin, His 


Asp 


Glu 


Cys 


Ser 


Gin 


Asn 


Glu 


Asp 


Gly 


Ala, Pro 


His 


Asn, Gin 


lie 


Leu, Val 
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IIa X/aI I 


1 \/e 
uys 


Am ^lr\ ^lii N 
win# V3IU 


Met 


1 All T\/r IIa 


Pha 


KA^f 1 All Twr 
ivit#if L>iJUf f yi 


Ser 


Thr 


Thr 


Ser 


Trp 


Tyr 


Tyr 


Trp, Phe 


Val 


Me, Leu 



The types of substitutions selected may be based on the analysis of the 
frequencies of amino acid substitutions between homologous proteins of 
different species developed by Schuiz et al.« Principles of Protein 

15 Structure, (Springer- Verlag, 1978), pp. 14-16, on the analyses of 
structure-forming potentials developed by Chou and Fasman, 
Biochemistry 13: 211 (1974) or other such methods reviewed by Schuiz 
et ai, Principles in Protein Structure, (Springer-Verlag, 1978), .pp, 108- 
1 30, and on the analysis of hydrophobicity patterns in proteins 

20 developed by Kyte and Doolittle, J. /IfoA Bioi. 157: 105-132 (1982). 

2. Polypeptides 

In addition to analogs of nucleic acid sequences, the present 
invention includes analogs of Actinomedura polyketide synthases that 
25 preferably retain polyketide synthase activity. Preferably, the analogs 
will share at least about 75% amino acid identity, more preferably, at 
least about 80% identity, even more preferably, at least about 85% 
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identity, even more preferably at least about 90% identity, and most 
preferably at least about 95% identity to an Actinomadura polyketide 
synthase, such as the polypeptide of SEQ ID NO: 13, SEQ ID NO: 14 or 
SEQ ID NO: IS. 

5 

3. Methods of Synthesizing Polypeptides 

In one embodiment, the polypeptides of the invention are made as 
follows, using a gene fusion. For example, fusion to maltose-binding 
protein ("MBP") can be used to facilitate the expression and purification 

10 of a polyketide synthase in a prokaryote such as Ecoli. The hybrid 

protein can be purified, for example, using affinity chromatography using 
the binding protein's substrate. See, for example, Gene 67: 21*30 
(1 988). When using a fusion protein that includes maltose binding 
protein, a cross-linked amyiose affinity chromatography column can be 

15 used to purify the protein. 

The cDNA specific for a given polyketide synthase or analog 
thereof can also be linked using standard means to a cDNA for 
glutathione S-transferase (''GST"), found on a commercial vector, for 
example. The fusion protein expressed by such a vector construct 

20 includes the polyketide synthase or analog and GST, and can be treated 
for purification. 

Should the MBP or GST portion of the fusion protein interfere 
with function, it is removed by partial proteolytic digestion approaches 
that preferentially attack unstructured regions, such as the linkers 

25 between MBP or GST and the polyketide synthase. The linkers are 
designed to lack structure, for instance using the rules for secondary 
structure-forming potential developed by Chou and Fasman, 
Biochemistry 13, 21 1, 1974. The linker is also designed to incorporate 
protease target amino acids, such as trypsin, arginine and lysine 

30 residues. To create the linkers, standard synthetic approaches for 

making oligonucleotides are employed together with standard subcioning 
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15 



20 



25 



methodologies. Other fusion partners other than GST or MBP can also 
be used. 

Additionally, the Actinomadura poiyketide synthases can be 
directly synthesized from nucleic acid (by the cellular machinery) 
without use of fusion partners. For instance, nucleic acids having the 
sequence of any of SEQ ID NO: VI 2 are subcloned into an appropriate 
expression vector having an appropriate promoter and expressed in an 
appropriate organism. Antibodies against Actinomadura poiyketide 
synthases can be employed to facilitate purification. 

Additional purifications techniques are applied as needed, 
including without limitation, preparative electrophoresis, FPLC 
(Pharmacia, Uppsala, Sweden), HPLC (e.g., using gel filtration, reverse- 
phase or mildly hydrophobic columns), gel filtration, differential 
precipitation (for instance, "salting out" precipitations), ion-exchange 
chromatography and affinity chromatography (including affinity 
chromatography using the RE1 duplex nucleotide sequence as the 
affinity ligand). 

A polypeptide or nucleic acid is "isolated** in accordance with the 
invention in that the molecular cloning of the nucleic acid of interest, for 
example, involves taking an Actinomadura poiyketide synthase gene 
nucleic acid from a ceil, and isolating it from other nucleic acids. This 
isolated nucleic acid may then be inserted into a host cell, which may be 
yeast or bacteria, for example. A polypeptide or nucleic acid is 
"substantially pure" in accordance with the invention if it is 
predominantly free of other polypeptides or nucleic acids, respectively. 
A macromolecule, such as a nucleic acid or a polypeptide, is 
predominantiy free of other polypeptides or nucleic acids if it constitutes 
at least about 50% by weight of the given macromolecule in a 
composition. Preferably, the polypeptide or nucleic acid of the present 
invention constitutes at least about 60% by weight of the total 
polypeptides or nucleic acids, respectively, that are present in a given 
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composition thereof, more preferably about 80%, still more preferably 
about 90%, yet more preferably about 95%, and most preferably about 
100%. Such compositions are referred to herein as being polypeptides 
or nucleic acids that are 60% pure, 80% pure, 90% pure, 95% pure, or 
5 100% pure, any of which are substantially pure. 

4. Meana for Identifying Polypeptides with Aetmomadun Polylcetide 
Synthase Activity 

in one aspect, the present invention provides methods for 
10 identifying polypeptides that are homologous to an Actinomadura 

polyketide synthase using an Actinomadura polyketide synthase cDNA, 
for example. 

Additionally, probes for Actinomadura polyketide synthase 
expression can be used, for example, to detect the presence of an 

15 Actinomadura polyketide synthase. Such probes include antibodies 
directed against an Actinomadura polyketide synthase or fragments 
thereof, nucleic acid probes that hybridize, under stringent conditions, to 
an Actinomadura polyketide synthase mRNA, and oligonucleotides that 
specifically prime a PCR amplification of an Actinomadura polyketide 

20 synthase mRNA. Nucleic acid molecules that bind to an Actinomadura 
polyketlde-encoding nucleic acid under high stringency conditions are 
identified functionally, or by using the hybridization rules reviewed in 
Sambrook et al.. Molecular Cloning: A Laboratory Manual, 2nd ed. (Cold 
Spring Harbor Press, 1989). 

25 Many deietional or mutational analogs of nucleic acid sequences 

for an Actinomadura polyketide synthase are effective hybridization 
probes for Actinomadura polyketide synthase-encoding nucleic acid. 
Accordingly, the present invention relates to nucleic acids that hybridize 
with such Actinomadura polyketide synthase*encoding nucleic acids 

30 under stringent conditions. Preferably, the nucleic acid of the present 
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invention hybridizes, under stringent conditions, with at least a segment 
of any of the nucleic acids described as SEQ ID NO: 1-1 2. 

"Stringent conditions" refers to conditions that allow for the 
hybridization of substantially related nucleic acids, where relatedness is 
5 a function of the sequence of nucleotides in the respective nucleic acids. 
For instance, for a nucleic acid of 100 nucleotides, such conditions will 
generally allow hybridization thereto of a second nucleic acid having at 
least about 85% homology, and more preferably having at least about 
90% homology. Such hybridization conditions are described by 

10 Sambrook et al.. Molecular Cloning: A Laboratory Manual, 2nd ed. (Cold 
Spring Harbor Press, 1989). 

PCR (polymerase chain reaction) can be used to detect nucleic 
acids having Actinomadura polyketide synthase sequences through 
amplification of such sequences using Actinomadura polyketide 

15 synthase nucleic acid primers. PCR methods of amplifying nucleic acids 
utilize at least two primers. One of these primers is capable of 
hybridizing to a first strand of the nucleic acid to be amplified and of 
priming enzymeniriven nucleic acid synthesis in a first direction. The 
other is capable of hybridizing the reciprocal sequence of the first strand 

20 (if the sequence to be amplified is single stranded, this sequence is 
initially hypothetical, but Is synthesized in the first amplification cycle) 
and of priming nucleic acid synthesis from that strand in the direction 
opposite the first direction and towards the site of hybridization for the 
first primer. Conditions for conducting such amplifications, particularly 

25 under preferred high stringency conditions, are well known. See, for 
example, PCR Protocols (Cold Spring Harbor Press, 1991). 

Antibodies against Actinorrtadura polyketide synthases can also 
be used to identify polypeptides that are homologous to Actinomadura 
polyketide synthases. Antigens for eliciting the production of antibodies 

30 against an Actinomadura polyketide synthase can be produced 

recombinantiy by expressing all of or a part of the nucleic acid of an 
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Actinomadura polyketide synthase in a bacteria or a yeast or other 
eukaryotic cell line. In one embodiment the recombinant protein is 
expressed as a fusion protein, with the non-Actinomadura polyketide 
synthase portion of the protein serving either to facilitate purification or 
5 to enhance the immunogenicity of the fusion protein. For instance, the 
non-Actinomadura polyketide synthase portion comprises a protein for 
which there is a readily-available binding partner that is utilized for 
affinity purification of the fusion protein. The antigen includes an 
"antigenic determinant," i.e., a minimum portion of amino acids 

10 sufficient to bind specifically with an anti-Actinomadura polyketide 
synthase antibody. 

Antisera to an Actinomadura polyketide synthase can be made, 
for example, by creating an Actinomadura polyketide synthase antigen 
by linking a portion of the cDNA for Actinomadura polyketide synthase 

15 to a cDNA for glutathione s-transferase ("GST") found on a commercial 
vector. The resulting vector expresses a fusion protein containing an 
antigenic segment of an Actinomadura polyketide synthase and GST 
that is readily purified from the expressing bacteria using a glutathione 
affinity column. The purified antigenic fusion protein is used to 

20 immunize rabbits. The same approach is used to make antigens based 
on other segments of Actinomadura polyketide synthase. Procedures 
for making antibodies and for identifying antigenic segments of proteins 
are well known. See, for instance, Hariow, Antibodias, Cold Spring 
Harbor Press, 1 989. 
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5. Polyketides 

in addition to polyketide synthases, the present invention also 
provides polyketides, including purified pradimicin and pradimicin 
analogsr and methods for synthesizing polyketides. For example, a 
5 vector containing a nucleic acid comprising SEQ ID N0:1 can be 

expressed in an organism, preferably Streptomyces, thereby resulting in 
pradimicin A synthesis. Preferably, all of the polyketide synthase genes 
required for polyketide synthesis are present in a single vector, and the 
genes are preferably in the same configuration as the cDNA. 

10 Preferred StreptomycBs organisms for polyketide synthesis 

include, for example, Streptomyces IMdans, Streptomyces coelicor and 
Streptomyces griseus. Preferred vectors for expression include, for 
example, plasmids plJ61, plJ702 and plJ922, which are described in 
Hopwood et. al., Gene Manipulation of Streptomyces, A Laboratory 

15 Manual (The John Innes Foundation, Norwich, UK 1985). Preferably, 
the vector includes a promoter that functions well at idiophase, which is 
a stage of secondary metabolite production, such as the promoter of the 
mel gene, which is present in vector piJ702. 

Preferred methods for preparing a polyketide such as pradimicin or 

20 an analog thereof comprise transforming a eukaryotic or prokaryotic cell 
with an expression vector for expressing intracellularty or extracellulariy 
a nucleic acid comprising a nucleic acid encoding a polypeptide sharing 
at least about 70% amino acid identity with an Actinomadura polyketide 
synthase, growing the transformed cell in culture, and Isolating the 

25 pradimicin or analog thereof from the transformed cell or the culture 
medium. Preferably, the polypeptide shares at least about 80% amino 
acid identity with an Actinomadura polyketide synthase, and more 
preferably, the polypeptide shares at least about 90% amino acid 
identity with an Actinomadura polyketide synthase. Most preferably, 

30 the expression vector comprises a nucleic acid encoding all polyketide 
synthase genes necessary for synthesis of pradimicin, such as SEQ ID 
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N0:1, The production of pradimicin A, for example, can be detected by 
the presence of a red pigment. Purification of pradimicin from 
Actinomadura, for example, is described in J. Antibiot. 41:1701-1704 
(1988). 

The present invention is further exemplified by the following non- 
limiting example. 



Example 1. Cloning of Aettnonmitun Polyketide Synthase Genes 
10 Bacterial strains and nlasmids 

Escherichia coff XU -Blue and pSEIOI {BioscL Biotech. Biochem. 
59:1835-1841 (1995)), a shuttle cosmid vector replicable in both 
Streptomyces lividans and E. coli, were used for preparation of an 
Actinomadura hibisca genomic library. E coli XLI-Biue and plasmids 
15 pUC1 1 8 and pUC1 1 9 were used for sequencing analysis. 

DNA isolation and manipulation 

Plasmid and genomic DNA isolations were done by the method of 
Hopwood et. at.. Gene Manipulation of Streptomyces, A Laboratory 
20 Manual (The John Innes Foundation, Norwich, UK 1 985). Plasmids 
-from f. coli were prepared with the Qiagen Plasmid Kit (Qiagen Inc., 
Chatsworth, CA). All restriction enzymes, T4 ligase and calf intestinal 
alkaline phosphatase were obtained from Takara (Kyoto, Japan). The 
procedure for library preparation is described, for example, in Mol. Gen. 
25 Gener. 236:39-48 (1992). 

DNA hybridization 

The hybridization conditions employed for reactions with the 
oligonucleotide probe, ^^P-labeied with T4 kinase, were as follows: a 
30 Nylon membrane with immobilized DNA was prehybridized at 400C for 
4 hours in 6X SSC buffer, which contains 5X Denhardt's solution 
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(Maniatis et al., Molecular Cloning: A Laboratory Manual (Cold Spring 
Harbor Laboratory Press, 1982)), 0.5% SDS and 100 //g/ml of heat 
denatured salmon sperm DNA. For overnight hybridization, the same 
buffer and temperature conditions were used. The genomic DNA 
5 blotted filter and plasmid DNA blotted filter were washed twice with 6X 
SSC buffer at 40°C for 30 minutes and with 0.6X SSC buffer at 60^C 
for 1 hour, respectively. 

doping pf th9 fl^ne? hgrPQlpflptig t9 tYP^ » PKS qan^s 

10 Amino acid sequences of S-keto synthase, acyl transferase and 

acyi carrier protein of polyketide synthases are strongly conserved in 
Straptomyces strains producing polyketide antibiotics. See Annu. Rev. 
Microbiol. 47:875-912 (1993) and J. Biol. Chem. 267:19278-19290 
(1 992). Based on these sequences, two oligonucleotide probes were 

15 synthesized. One was designed based on the amino acid sequences of 
the Streptomyces &-keto synthase around the cysteine residue which is 
thought to be an active site of the enzyme. See Figure 2, probe 1 (SEQ 
ID NO: 16). The other probe was synthesized based on the amino acid 
sequences of the Streptomyces acyl transferase around the serine 

20 residue which is believed to be a catalytic domain. See Figure 2, probe 
2 (SEQ ID N0:1 7). Genomic DNA from Actinomadura hibisca PI 57-2 
(ATCC 53557) that was digested with several restriction enzymes was 
subjected to Southern blot analysis with probes 1 and 2, which were 
separately labeled with ^^P and then mixed.. Weak but specific signals 

25 could be detected. To clone the hybridized fragment, a library was 
prepared from the strain PI 57-2 and screened by the colony 
hybridization with probes 1 and 2 under the same conditions as that for 
genomic Southern analysis. Several positive cosmid clones were found 
to hybridize to the probes. Two clones, designated pPRMI and 

30 pPRM14, were selected for further analysis. 
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The physical maps of pPRMI and pPRMU were determined and 
are shown in Figure 3. Using Southern blot hybridization analysis of 
chromosomal DNA of the strain P- 157-2 with these two cosmid clones 
as probes, it was confirmed that the inserted DNAs of pPRMI and 
5 pPRM14 had not been structurally rearranged during the construction of 
the library. The position of the hybridized region with oligonucleotide 
probes was defined by Southern blot analysis. 

SaQuenca anaivaia. 

10 The 8.2-kb Sac\ fragment prepared from pPRMI was cloned Into 

the Sac\ sites of pUC1 1 8 and pUC1 1 9 (pUC1 1 8 and pUC1 1 9 are 
available, for example, from Takara Syuzo, Kyoto, Japan). After 
construction of a series of plasmids subcloned from these plasmids, 
single stranded DNAs were prepared with helper phage Ml 3 K07, 

15 which Is also available, for example, from Takara Syuzo. Sequencing 
was done by the dideoxy chain termination method of Sanger at aL, 
using an automatic DNA sequencer ALF (Pharmacia, Sweden). It was 
also done with [a-^^S]-dCTP as the radioactive label. 

20 Nucieotida sequence of the DNA fra gment hybridized to tha probe 
As one approach to examine whether the DNA fragment 
hybridized to the probes carries the PKS gene for biosynthesis of PRM 
A, the nucleotide sequence of the 8.2-kb Sac\ fragment containing 
hybridized region was determined. Computer analysis of the DNA 

25 sequence, using Frame Analysis (See Gana 30:157-166 (1984)), 
revealed eleven ORFs (0RF1-1 1), which are oriented in the same 
direction except for ORFIO. To understand the functions of each the 
ORFs deduced by DNA sequencing, databases, including ONASIS, were 
searched using their translated products. The results are summarized in 

30 Table 1, in^. The 0RF1, 0RF2 and 0RF3 gene products show strong 
similarities (44-73% amino acid identity) with ORF 1, 2 and 3 gene 
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products of gra (EMBO J. 8:2717-2725 (1989)), tern {EMBO J. 8:2727- 
2736 (1989)) and act U Biol. Cham. 267:19278-19290(1992)), which 
are known to encode condensing enzyme, acyltransferase and acyl 
carrier protein for granaticin, tetracenomycin and actinorhodin 
5 biosynthesis, respectively. The proteins encoded by 0RF4 and 0RF6 
have similarities with the N and C-terminal half of the TcmN protein {J. 
BactarioL 174:1810-1820 (1992)) (52% and 46% amino acid identity), 
respectively, which is thought to be a multifunctional 
cyclase/dehydratase participating in tetracenomycin biosynthesis. The 

10 0RF7 gene product is homologous to the fabG product of £ coli {J. Bid/. 
Cham. 267:5751-5754 (1992)) (3-ketoacyl-ACP reductase, 38% amino 
acid identity) and granaticin-producing polyketide synthase chains 5 and 
6 {EMBO J. 8:2717-2725 (1989)) (30% and 35% amino acid identity, 
respectively). Both of the 0RF8 and 0RF9 gene products have some 

15 similarity to hypothetical protein 1 participating in spore color formation 
in Streptomyces coalicolor (MoL Microbiol. 4:1679-1691 (1990)) (23 
and 24% amino acid identity, respectively) in a limited region. The 
0RF10 gene product has a significant similarity to a variety of 
monooxygenases, including cytochrome P450 (28-40% amino acid 

20 identity). The 0RF1 1 gene product shows similarity with the 
hypothetical protein 1 participating in spore color formation in 
Straptomycas coalicolor {Mol. Microbiol. 4:1 679-1 691 (1 990)) (51 % 
amino acid identity), and less extensive, although significant, with the 
CurG protein of Straptomycas cyanaus {Garta 11 7:131-1 36 (1992)) 

2S (45% amino acid identity) and the tcm\ protein of Straptomycas 

glaucascarts (EMBL data library no. S27691) (35% amino acid identity). 
The 0RF5 gene product shows some similarity to a histidine kinase of 
Caulobactar crascarttus {Proc. Natl. Acad. Sci. 89:10297*10301 

(1992)) and multicatalytic endopeptidase of 5. caravisiaa {Mol. Call. 

30 Biol. 11:344-353 (1991)). 
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SEQUENCE LZSTZN6 



(1) GEHERAL IKFORIUITION: 



(i) APPLICANTj 0)ci, Toshikazu 
Dairi, Tohru 

(ii) TITLE OF IKVENTION: POLYKETIDE SYNTHASES FOR PRAOIKICIN 
BIOSYNTHESIS AND DNA SEQUENCES ENCODING SA« 

(ill) NUMBER OF SEQUENCES: 25 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Dechert Price & Rhoads 

III SHS^i Princeton Pike Corporate Center, PC Box 5218 

(C) CITY: Princeton 

(D) STATE: NJ 

(E) COUNTRY: USA 

(F) ZIP: 08543-5218 

(V) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC conpatible 

(C) OPERATING SYSTEM: PC-DOS/MS -DOS 

(D) SOFTWARE: Patentin Release #i.o, Version #1.30 

(Vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C) CLASSIFICATION: 

(Viii) ATTORNEY/ AGENT INFORMATION: 

(A) NAME: Bloom, Allen 

(B) REGISTRATION NUMBER: 29,135 

(C) REFERENCE/DOCKET NUMBER: BMS-X25 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (609) 520-3214 

(B) TELEFAX: (609) 520-3259 



(2) INFORMATION FOR SEQ ID N0:1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8169 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
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(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:1: 



6AGCTCGGCC ACGTCGACAC 


CGAGGAGCTG 


CCCGCCGCCG 


ACGAGCAGGG 


GCTCGACGTC 


60 


CGGG6CCGCA CGTGAGCGGA 


CCGCAGGGGG 


GCGGGCCGCG 


CCGCCGTGCG 


ATCACCGGCA 


120 


TCGGGGTGCT 


CGCGCCCGGC 


GGCTCGGGCC 


GGAAGGCGTT 


CTGGAACCTG 


CTGACCGACG 


180 


GCCGCACCGC 


GACCCGGAAG 


ATCTCGCTGT 


TCGACCCGGC 


GGGCTTCCGG 


TCCCGGATCG 


240 


CCGCCGAGTG 


CGACTTCGAC 


CCCGCCGCCG 


AGGGGCTGAC 


GCCCCGCGAG 


GTCCGGCGCA 


300 


TGGACCGGGC 


CGCGCAGCTC 


GCGGTGGTGT 


CGGCGCGCGA 


GGCGCTCGCC 


GACAGCGGGC 


360 


TGGGGGCGGG 


CGAGGGCGAC 


CCGGCGCGGT 


TCGCGGTGTC 


GCTCGGCAGC 


GCCGTCGGCT 


420 


GCACGATGGG 


GCTGGAGGAC 


GAGTACGTCG 


TGGTCAGCGA 


CCAGGGCCGC 


GACTGGCTGG 


480 


TCGACCACTC 


CTACGGCGTG 


CCGCACCTGT 


ACCGGCACCT 


GGTGCCCAGC 


TCGCTGGCGG 


540 


CCGAGGTCGC 


CTGGGCGGGC 


GGGGCCGAGG 


GCCCGGTCAC 


GCTGATCTCG 


ACGGGCTCGA 


600 


CCTCCGGGCT 


CGACGCGGTC 


GGGCACGGCG 


CGCGCGTCAT 


CGCCGAGGGC 


TCGGCGGACG . 


660 


TGGCGCTCGC 


CGGGGCCACC 


GACGCGCCCA 


TCTCGCCGAT 


CACGGTGGCG 


TGCTTCGACG 


720 


CCATCCGGGC GACCTCGCCG 


AACAACGACG 


ACCCCGAGCA 


CGCGTCCCGG 


CCGTTCGACC 


780 


G66AGCGCAA 


CGGGTTCGTG 


CTCGGCGAGG 


GCGCGGCGGT 


GTTCGTCCTG 


GAGGAGCTGG 


840 


AGCACGCCCG 


CCGCCG6GGC 


GCGCACGTCT 


ACTGCGAGGT 


CGCGGGGTAC 


GCCACGCGCG 


900 


GCAACGCCTA 


CCACATGACG 


GGCCTGAAGC 


CCGACGGCCG 


CGAGATGGCC 


GAGGCGATCA 


960 


GGGTGGCGAT GGACGCCGCC 


CGGGTCGCCC 


CGGCCGACCT 


CGACTACATC 


AACGCGCACG 


1020 


GCTCGGGCAC 


CAAGCAGAAC 


GACCGGCACG 


AGACGGCCGC 


GTTCAAGCGC 


AGCCTCGGCG 


1080 


AGCGCGCCTA 


CGA6CTGCCG 


GTCAGCTCCA 


TCAAGTCGAT 


GGTCGGGCAC 


TCGCTCGGCG 


1140 


CGATCGCCTC GATCGAGCTG 


GCCGCGTGCG 


CGCTGGCGAT 


CGAGCACGGT 


GTGGTGCCGC 


1200 


CGACCGCCAA CCTGCACAAC 


GCCGACCCCG 


AATGCGACCT 


GGACTACGTG 


CCGCTGGTGG 


1260 
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CCCGCCAGGC CCCGATCCGC ACGCTCCTGA GCCTGGGCAG CCGCTTCGCC GGCTTCCAGT 
CCGCCACCGT CCTGCGGGAG GCCGCGTGAG CGTCCTGACG GCGGACGCGC CGGCGGTCAC 
CCG6ATCGGC GTGCTCGCCC CGACCGCGAT CCCCGTCGAG GAGCACTCCG CGGCGACCTT 
GCGCGGCGTC CCGGTCATCG GGCCGCTGAC CAGGTTCGAC GCCGCCCGCT ACCCGTCCCC 
GTTCGGCGGC GAGGTGCCCG GGTTCGACGC CGCCGAGCGC GTCCCGG6GC GCCTCATCCC 
GCA6ACCCAC CACTGGACGC ACCTGGCGCT GGCCGCCACC GACCTCGCCC TCGCCGACGC 
GGGCGTG6TC CCGGCCGAGC TGCCCGAGTA CGAGATGGCG GTGGTGACCG CCAGCTCGTC 
GCGCGGCGTG GAGTTCGGGC AGCGCGACAT CCACGCGTTG TGGCGGGACG GGCCCCGGCA 
C6GCGGGGCC TACCAGTCGA TCGCCTGGTT CTACCCGGCG ACGACCGGCC AGATCTCCAT 
CCCGCACGGG ATGCGCGGCC CCTGCGGCGT CGTGGTCGCC GAGCAGGCCG GGGCGCTGGA 
GTCGTTCGCG CAGGCCCGCC GCTACCTGGC GGACGGGGCG CGGCTCGTGG TCTCCGGCCC 
CACCGACGCG CCGTTCAGTC CGTACGGCCT GACCTGCCAG CTCGGCAGCG GGCGGCTTAG 
CACGGGTGCC GACCCGGCCC GCGCCTACCT GCCGTTCGAC GCCGCCGCGA ACGGCTTCGT 
GCCGGGCGAG GGCCGCGCGA TCCTCATCAT CGAGCAAGCC GCCACCGCGC AGGACCGCTC 
CTACGGGCGG ATCGCGGGCT ACGCGGCGAC CTTCGACCCG CCGCCGGGCT CGCGCCGCCC 
TCCGACGCTG GAGCGAGCCG TGCGCGCCGC CTTGGACGAC GCCCGGCTCA CACCCGCCGA 
CGTGGACGTG GTGTTCGCCG ACGCGGCGCG CGTCCCGGAT CTCGACCGCG CGCAGGCCGA 
CGCGATCGGC GCGGTCTTCG GGCCGCGCGG CGTGCCCGTC ACCGCGCCCA AGAGCCTGAC 
CGGCCGCCTG TACGCGGGCG GCCCCGCGCT CGACGCCGCG ACGGCGCTGC TGGCCATGCA 
CGACTCGGTG ATCCCGCCGA CGGCCGGCGG CGCGGACGTC CCGCCCGGCT ACGCGCTCGA 
CCTGGTCGGC GCGGAACCGC GCCCGGCCCG GCTGCGCACC GCyiCTGATCA TCGCCCGCGG 
CTACGGGGGC TTCAACGCCG CCCTGGTGCT GCGCGGCCCG AACACCTGAC AACGACCCGA 
GAGGACGGAC GAGATGGCAA CCCGCGAACG CACCATCGAC GACCTGCGCG CGCTGATGCG 
CGCCGCCGTC GGCGAGGCCG ACGACATCGA CCTGGACGGC GACATCCTCG ACTCCACCTT 
CACCGAGCTG GAGTACGACT CGCTCGCCCT GCTGGAGCTC GCGGCCCGCA TCGAGACGCA 



1320 

13S0 

1440 

1500 

1S60 

1620 

1680 

1740 

1800 

1860 

1920 

1980 

2040 

2100 

2160 

2220 

2280 

2340 

2400 

2460 

2520 

2580 

2640 

2700 

2760 
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CT6GGCCCTG CTCATCCCCO AGCJACCACGC GTCCCGGCTC GAGACCCCCC CCATGTTCCT 
CGACTACGTG AACGOGCGCG CGGTGGCCGA GCGATGACGC AGTGGCGCAC CGACAGCGTG 
ATCGTGATCG ACGCGCCGCT CGACGTCGTC TGGGACATGA CCAACGACGT CGCCTCCTGG 
CCGGAGCTGT TCGACGAGTA CGCCTCGGCC GAGATCCTGG AGCGCCACCG CGACACCGTC 
CGCTTCCC6C T6ACGATCCA CCCCGACGCC GACGCCAACC CCTGGTCOTG 6GTGTCGGAG 
CSCACGCCCG ACCGCGCCGC GCTCACCGTC AACGC6CACC 6CGTGCAGAC CCGCTCGTTC 
GAGCACATCA ACCTGCGCTG GCACTACCGC GAGGTGCCCG GCGGCGTGGA GATGCGCT6G 
CGGCAGGACT TCGCGATGAA GGAGGCGTCG CCGGTGTCGC TGGCGGCGAT GACCGAGCGC 
ATCCAGAGCA ACTCCCCCGT CCAGATGAAG CTGATCAAGG ACAAGGTGGA GCGGGCGGCC 
CG6GGCGCGC GGTGATCGAG TTCCTGCTCC CGGTCGCGCT GCTCGGCAAC GGGTTGTGCG 
CGG6CGTGCT GACGGGCAGC GTCCTCGGCG TCGTGCCGTA CTACCGGACG CTGCCCGAGG 
ACCGCTACAT CGCCGCGCAC GCCTTCGCGC TCGGCCGCTA CGACCCGTTC CAGCCCGTGT 
6CCT6CTG6T CACGGTGGCG GCCGACGCGG TCGCGGCGGC GGTCGCGCCG ACCGCCGCCG 
CCCGGGT6CT CTGCGCGCTC GCCGCCGTGC TCGCCCTGGC GGTGCTGGCG ATCTCGCTCA 
CCCGCAACGT GCCGATGAAC CGCCGGATCA AGCGCCTGGA CCCGGCCGCG CCGCCCCCCG 3660 
GGTTCAGCGC GCCCGCGTTC CTGCGCCGCT GGGCGGGCTG GAACGCGGCG CGCACCGGCC 3720 
TGACGCTGGC CGCCCTTCTC AGCAACACGG CCGCCCTCGG CGTGCTGCTG TGACCGATCG 3780 
GGAAGGGAGG GACATGACCG AACCGGAAGG ACCGCACGCC GCGAGCCTGC GGCTCCAATC 3840 
TCTGCTGGAC GGCATGCGCG TCGCCAAGGT CCTCCAGGTG CTCGCCGAAC TCCAGGTG6C 3900 
CGACGCGGTC GCCGACGGCC CCTGCAAGCC CGCCGA6ATC GCCGCCGACG TCGCCGCCGA 3960 
CCCCGACGCG CTGTACCGGG TGCTGCOCTG CGCCGCCTCC TTCGGGGTGT TCACCGACGA 4020 
C6AGGACCCC CGGTTCGGGC TCACCCCGAT GCCCGCGCTO CTGCGCACCG GCACCGACOA 4080 
CAGCCACCGC GACCTGTTCA TGATGGCGGC GGGCGACCTG TGGTGGCGGC CGTACGGCGA 4140 
GCTGCTGGAG ACGGTGCGGA CCCCCCGCCC CGCCGCC6AC CTCGCGTTCG 6GATGCCGTT 4200 
CTACGACTAC CTCGGCACCG ACCCGGCCGC CGCCCG6CTC TTCGACCGCG CGATGACGCA 4260 
CGTCAGCAAG GGCCAGGCGA AGGCGATCCT CGGCCGCTGC TCGTTCGAGC 6GTACGCCCG 4320 



2820 
2880 
2940 
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3060 
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3240 
3300 
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3480 
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GATCGCCGAC 


GTGGGCGGCG 


GCCACG6CTA 


CTTCCTCGCG 


CAGGTGTTGC 


GCAGCAGCCC 


43BO 


GCGCACC6AG 


GGCGTGCTGC 


TGGACCTGCC 


GCACGT6GTG 


GCCGGAGCCC 


CGGCGGTGCT 


4440 


GGAGAAGCAC 


GAGGTCGCCG 


ACCGCGTCCA 


GGTCGTCCCG 


GGCAGCTTCT 


TCGACGCGCT 


4500 


GCCCACCGGC 


TGCGAC6CCT 


ACCTGCTGAA 


AGCGATCCTC 


ATCAACTGGC 


CCGACGCCGA 


4560 


CGCCGAACGC 


ATCCTGCACC 


G6GTGCCGCA 


GGCGATC6GC 


AACGACCGCG 


ACGCGCGGCT 


4630 


GCTGGTGGTC 


GAGCCCGTCG 


TCCCGCCCGG 


CGACGTCCGC 


GACTACAGCA 


AGGCCACCGA 


4680 


CATCGACATG 


CTCGCCATCA 


TCGGCGGGCG 


GCAGCGCACC 


GTCGCCGAGT 


GGCGGCGGCT 


4740 


GCTGCGCGCG 


GGCGGCTTCG 


AGCTGGTGGG 


CGAGCCCACG 


CCGGGCCGCC 


GCGAGGTGAT 


4800 


GGAGTGCCGC 


CCCATCTGAA 


CCCGTCCCAC 


CC6TCGCCCA 


CATCCAGGGA 


GAACGCATGA 


4860 


CCGACACATC 


GTTCGCCGGC 


AAGAACGCGC 


TGATCACCGG 


CGGCACCCGG 


GGCATCGGCC 


4920 


GG6CCGTCGC 


GCTCGGCCTG 


GCCGGC6CCG 


GGGCCAATGT 


CACCGTCTGC 


TACCGCAGCG 


4980 


ACGCC6A6TC 


CGCCGCCGCG 


ATGGAA6CCG 


AGCTGGCCGC 


CACCGACGGC 


AAGCACCACG 


5040 


TCCTCCAGGC 


CGACATCGGC 


AACGCCGGGG 


ACGTCCGCCG 


CCTGCTGGAC 


GAGGTCGCCG 


5100 


CCCGCATGGG 


CTCGCTOGAC 


GTAGTCGTGC 


ACAAC6CC6G 


GCTGATCAGC 


CACGTGCCGT 


5160 


TCGCCGACCT 


GGAGCCCGAG 


GA6TGGCACC 


GGATCGTCGA 


CTCCAACCTG 


ACCGGCATGT 


5220 


ACCTGGTGGT 


GCGGGCCGCG 


CTGCCGCTGC 


TGTCGGAGGG 


CGGCGCGGTC 


GTCGGCGTCG 


5280 


GCTCCAAGGT 


CGCGCTCGTC 


GGCATCTCGC 


AGCGCACCCA 


CTACACCGCC 


GCCAAGGCCG 


5340 


GGCTCATCGG 


GTTCGTGCGC 


TCGCTCAGCA 


AGGAGCTGGG 


GCCGCTCGGC 


ATCCGGGTCA 


5400 


ACCTGGTCGC 


GCCCGGCATC 


ACCGAGACCG 


ACCAGGCCGC 


GCACCTGCCC 


CCCGTGCAGC 


5460 


GCGAGCGCTA 


CCAGAGCATG 


ACCGCGCTCA 


AGCGGCTCGG 


CCAGGCCGAC 


GAGGTCGCCG 


5520 


ACGTGGTCCT 


GTTCCTCGCC 


CGTCCCGGCG 


CGCGCTACGT 


CACCGGCGAG 


ACCGTCAACG 


5580 


TGGACGGGGG 


GATGTGACCA 


TGGCCGACAG 


CGGCCCGGTG 


TTCCGGGTGA 


TGCTCCG6AT 


5640 


GGAGATCGTC 


CCGGGCAGGG 


AGGCGGAGTT 


CGAGCGGGTC 


TGGTACTCGG 


TCGGCGACAC 


5700 


CGTCAGCGGC 


AACCCCGCCA 


ACCTCGGCCA 


GTGCGTGCTG 


CGCAGCGACG 


ACGAGGAGAG 


5760 


CGTCTACTAC 


ATCATGAGCG 


ACTGGATCGA 


CGAGGCGCGG 


TTCCGCGAGT 


TCGAGCGCAG 


5820 
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CGACGGCCAC 


GTCTAGCACC 


GCCGCAAGCT 


GCACCCGTAC 


CGGGTGAAGG 


GCAGCATGGC 


5880 


GACGATGAAG 


GTCGTGCACG 


ACCTCGGCCG 


CGCGGCGGCG 


GAGCCGGTCC 


GGTGACGGCC 


5940 


GGGCAGGTGC 


GGGTCCTGGT 


CCGCTACCAG 


GCTCCGGGCG 


ACGACCCC6A 


GGCCGTCGTC 


6000 


CAGGCGTACA 


AGCTGGTCTG 


C6AGGAACT6 


CGC6GGACGC 


CCGGCCTGCT 


CGGCAGCGAG 


6060 


CTGCTGGCGT 


CGCACGCTCG 


ACGAGGGACG 


GTTCGCGGTG 


CTGAGCCTGT 


GGAGCGA06C 


6120 


CGCGCGGTTC 


CAGGAATGGG 


AGCAGGGCCC 


GGCGCACAAG 


GGCCAGACGT 


CCGGCCTGCG 


6180 


CCCGTTCCGG 


GACACCTCTT 


CGGGGCGCGG 


CTTCGATTTC 


7ACGAAGTGG 


TGCACGCCCT 


6240 


GTAAGAACAA 


CGAAGGGCCC 


GGCACGCGCA 


T6GCGTGCCG 


GGCCCTTTCA 


CATCCGTGCC 


6300 


TACCAGGCGA 


TGGGCAGCGC 


GTCCGGCCGC 


GCGAACGCCA 


AGCCGGGCCG 


CCAGGTGATG 


6360 


TCGGCATCGT 


CGATAGCGAG 


ACGCAGCGCG 


GGCGTCCGCT 


CCACCAGCGT 


CTCCAGCACG 


6420 


ACCTGAAGCT 


CCAGCCGGGC 


GAGCGGCGCG 


CCCAGGCAGT 


AGTGGATGCC 


GTGGCCGAGC 


6480 


GC6ATGT6CG 


GGTTGTCGGT 


ACGGCCGAGG 


TCGAGTTCCT 


CGGGATCGGC 


GAACACCTCC 


6540 


GGATCGCGGT 


TGGCGGCGTT 


GAAAAGCGGG 


ATGACCGCCT 


CGCCCGCGCG 


CACGAGGGTG 


6600 


CCGCCGACTT 


CCACATCCTC 


GACCGCGATG 


CGGATCGCGC 


CCGCGCCCCC 


GCCGATCTGC 


6660 


CCGTACCGTA 


GCAGTTCCTC 


AACGCCCGCC 


GGGATACCCG 


ACGGGTCCTC 


GCGCAGCCGC 


6720 


GCGTACCGCG 


ACGGCTCGCG 


CAGCAGGTGG 


TAGACCGAGT 


CCGTGATCGC 


CGCCGTGGTG 


6780 


CTGTGGTAAC 


CCGCCGCCAG 


CAGCGTCATG 


CCGAAGGTGA 


GCAGTTCCTC 


CTCGCTGAGG 


6840 


CCGTCGTCGG 


CGTGCGCCGG 


GCTCAGCAAC 


GACAGCAGGT 


CGTCGGCGGG 


CGCGGCCGTC 


6900 


TTGGCGTCGA 


TCAGCTCGGC 


GAGGTAGCCG 


CGCAGCCGCC 


CGACCGCGGC 


CTTGATCTCG 


6960 


TCGGCCTGCG 


CGAGAGCGGG 


CGCGCCGATG 


GTGAGCATCC 


GGTCGGTCCA 


GTCCTGGAAG 


7020 


CGCGGCCGAT 


CCTCCGGCGG 


AACGCCCAGC 


ATCTCGCAGA 


TGACGGTGAC 


CGGCAGCGGC 


7080 


AGCGCCAGGT 


GCGCGATCAG 


GTCGGCGGGC 


GGGCCGTGCT 


CGACCATCTC 


GTCCACGAAC 


7140 


CCCGACGTCA 


GGTCGCGCAC 


GTGCGCGCGC 


ATCCCCTCCA 


CACGACGGGC 


GGTGAACGCG 


7200 


CGAGACACGA 


TCTTGCGCAT 


CCTCGTGTGC 


TCGGGCGGGC 


TCATGATGAC 


CAGCGACTTG 


7260 


GAGCCGCGCT 


GCATCGGGAT 


CAGGCGCGGC 


GCGCCCGGCC 


GGGTCACCGC 


CTCCTTGCTG 


7320 


AAGCGCCGGT 


CCGAGGTGAC 


GAACCGGACG 


CTGGCGTAGC 


GCGTCACGAC 


CCACGCGT6G 


7380 
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TCGCCGCTCG GCAGCACCAC CTTGCCGACC GGGTCGGACG CCCGCAGGCC CGCCTGCTCG 
CACGGCGGCT GGAAGGGGTC GTCCGGCCGC AACGGGAAGG CCGGCCTCAC GTCGGGGC6G 
GGGTC6ACGG TCGGGGCATC CTTCGAGGAG GGCATACGCC AGGCTTGCAA CGACGCCTC6 
AAGCGCGCTC AACGCGCGCT CGCTCCACCG TCCTTCGAGC GGCCCCCCAG CTGCGGTGAC 
CACACTCTGC GGCTACCGCC TCACAGCCCC GACCGAGC6A TGGTTCCCAT GGACAGCTTC 
CTGATCCTCG CCCGCATGTC CCCCTCGTCG GAGAAGGAGG TGGCGCGCCT GTTCGCCGAC 
TCCGAC6AGG CCACCGAGCT GCC6GAGGTG GCCGGGACGG TCAGCCGdAG CCTCCTGTCG 
TTCCACGGCC TGTACTTCCA CCTGACGGAG CTCGAGGAGA GCACGGACAG GACGCTCAAC 
GGCATCCACG AACACCCCGA GTTCGTCCGG CTGAGCCGCC AGCTGTCCGG TCACGTCCAG 
6CGTAC6ACC CGAAGACGTG GCGCTCGCCC GCCGACGCCA TGGCCCGCGA GTTCTACCGG 
TGGGAGG0G6 GGACCGGCGT CGTGCGCCGC TGACCCGTCC CGAGTCCCAC CGGTCGCACG 
TTCCTCACTC TCCGTTGACT CCCTTCCTCG ATAGCGTCAT CGTTGGTGGC CCACCTGGAC 
GACGGAGCCA TCTGAGGGGA AGCGTTGGGT ACCGATACTC TCCCGAGACT CACCGACGCC 
GGAGAGCTC 

(2) INFORMATION FOR SZQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1278 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA ^ 
(lii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 



7440 

7500 
7560 
7620 
7680 
7740 
7800 
7860 
7920 
7980 
8040 
8100 
8160 
8169 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:2: 
GTGAGCCGAC CGCAGGGGGG CGGGCCGCGC CGCGTCGCGA TCACCGGCAT GGGGGTCGTC 
GCGCCCGGCG GCTCGGGCCG GAAGGCGTTC TGGAACCTGC TGACCGACGG CCGCACCGCG 



60 
120 



• # 
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ACCCGGAAGA 


TCTCGCTGTT 


CGACCCGGCG 


GGCTTCCGGT 


CCCGGATCGC 


CGCCGAGT6C 


180 


GACT7CGACC 


^^^^^1% ^m^^M^ M^^^^m m 

CCGCCGCCGA 


G6GGCTGACG 


CCCCGCGAGG 


TCCGGCGCAT 


GGACCGGGCC 


240 


GCGCAGCTC6 


CG6TGGT6TC 


GGCGCGCGAG 


GCGCTCGCCG 


ACAGCGGGCT 


GGTGGCGGGC 


300 


GAGGGCGACC 


CGGCGCGGTT 


CGCGGT6TCG 


CTCGGCAGCG 


CCGTCGGCTG 


CACGATGGGG 


360 


CTGGAGGACG 


AGTACGTCGT 


GGTCAGCGAC 


CA6GGCCGC6 


ACTGGCTGGT 


CGACCACTCC 


420 


TACuuCGTGC 


CG CACCTGT A 


CCGGCACCTG 


GTGCCCAGCT 


CGCTGGCG6C 


CGAGGTCGCC 


480 


'Pfi/iti f*fin.fi f^fi- 




CCUGGTCACG 


CTGATCTCGA 


CGGGCTGCAC 


CTCCGGGCTC 


540 


wAuO^iOTww 


uG wACGGCGC 


GCGCGTCATC 


GCCGAGGGCT 


CGGCGGACGT 


GGCGCTCGCC 


600 


wGGGCCACCG 


ACGCG CCC AT 


0^0n0^0* 0%0%0* K 

CTCGCCGATC 


ACGGTGGCCT 


GCTTCGACGC 


CATCCGGGCG 


660 


ACCTCGCCGA 


ACAACGACGA 


CCCC6AGCAC 


GCGTCCCGGC 


C6TTCGACCG 


GGAGCGCAAC 


720 


GGGTTCGTGC 


TCGGCGAGGG 


CGCGGCGGTG 


TTC6TCCTG6 


A6GAGCTGGA 


GCACGCCCGC 


780 


CGCCGG6GCG 


CGCACGTCTA 


CTGCGAG6TC 


GCGGGGTACG 


CCACGCGCGG 


CAACGCCTAC 


640 


CACATGACG6 


GCCTGAAGCC 


CGACGGCCGC 


GAGATGGCCG AGGCGATCAG 


GGTGGCGATG 


900 


GACGCCGCCC 


GGGTCGCCCC 


GGCCGACCTC 


GACTACATCA ACGCGCACGG 


CTCGGGCACC 


960 


AAGCAGAACG 


ACCGGCACGA 


GACGGCCGCG 


TTCAAGCGCA 


GCCTCGGCGA 


GCGCGCCTAC 


1020 


GAGCTGCCGG 


TCAGCTCCAT 


CAAGTCGATG 


GTCGGGCACT 


CGCTCGGCGC 


GATCGGCTCG 


1080 


ATCGAGCTGG 


CCGCGTGCGC 


GCTGGCGATC 


GAGCACGGTG 


TGGTGCCGCC 


GACCGCCAAC 


1140 


CTGCACAACG 


CCGACCCCGA 


ATGCGACCTG 


GACTACGTGC 


CGCTGGTGGC 


GCGCGAGGGC 


1200 


CGCATCCGCA 


CGGTGCTGAG 


CGTGGGCAGC 


GGCTTC6GCG 


GCTTCCAGTC 


CGCCACCGTC 


1260 


CTGCGGGAGG 


CCGCGTGA 










1278 



(2) INFORMATION FOR S£Q ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1223 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: CDNA 
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(iii) HYPOTHETICAL: NO 
(Iv) ANTl -SENSE: HO 



(acl) SEQUENCE DESCRIPTION: SI 


SQ ID M0:3: 








CSTGAGCfSTCf! 


^wAwCGCGCA 


CGCGCCGGCG 


GTCACCGGGA 


TC6GCGTGGT 


CGCGCCGACC 


60 


uuuax wuuWu 


TCGAGGAGCA 


CTGGGCGGCG 


ACGTTGCGCG 


GCGTCCCGGT 


CATCGGGCCG 


120 


vrniAwwASGT 


TCGACGCCTC 


GCGCTACCCG 


TCGCCGTTCG 


GCGGCGAGGT 


GCCCGGGTTC 


160 


UAwuwwGwwG 


AGCGCGTCCC 


GGGGCGGCTC 


ATCCCGCAGA 


CCGACCACTG 


GACGCACCTG 


240 


W WU W A WW WWO 


WwAwwGACCT 


CGCCCTCGCC 


GACGCGGGCG 


TGGTCCCGGC 


CGAGCTGCCC 


300 


GAGTACGAC& 
* n w wnw4i 


1 wwCwuXaKjI 


GACCGCCAGC 


TCGTCGGGCG 


GCGTGGAGTT 


CGGGCAGCGC 


360 


GACaXTCCAGG 


WW A A w Arwwww 


wwACGGG CCC 


CGGCACGTCG 


GGGCTACCAG 


TCGATCGCCT 


420 


GGTTCTACGC 


GGC6AC6ACC 


GGCCAGATCT 


CCATCCGGCA 


CGGGATGCGC 


GGCCCCTGCG 


480 


6CGTCGTGGT 


CGCCGAGCAG 


GCCGGGGCGC 


TGGAGTCGTT 


CGCGCAGGCC 


CGCCGCTACC 


540 


TGGCGGACGG 


GGCGCGGGTG 


GTGGTGTCCG 


GCGGCACCGA 


CGCGCCGTTC 


AGTCCGTACG 


600 


GCCTGACCTG 


CCAGCTCCGC 


AGCGGGCGGC 


TTAGCACGGG 


TGCCGACCCG 


GCCCGCGCCT 


660 


ACCTGCCGTT 


CGACGCCGCC 


GCGAACGGCT 


TCGTGCCGGG 


CGAGGGCGGC 


GCGATCCTCA 


720 


TCATCGAGCA 


AGCCGCCACC 


GCGCAGGACC 


GCTCCTACGG 


GCGGATCGCG 


GGCTACGCGG 


780 


CGACCTTCCA 


CCCGCCGCCG 


GGCTCGGGCC 


GCCCTCCGAC 


GCTGGAGCGA 


GCCGTGCGCG 


840 


CCGCCTTGGA 


CGACGCCCGG 


CTCACACCCG 


CCGACGTG6A 


CGTGGTGTTC 


GCCGACGCGG 


900 


CGGGCGTCCC 


GGATCTGGAC 


CGCGCGGAGG 


CCGACGCGAT 


CGGCGCGGTC 


TTCGG6CC6C 


960 


GCGGCGTGCC 


CGTCACCGCG 


CCCAAGAGCC 


TGACC6GCCG 


CCTGTACGCG 


GGCGGCCCCG 


1020 


CGCTCGACGC 


CGCGAC6GCG 


CTGCTGGCCA 


TGCACGACTC 


GGTGATCCCG 


CCGACGGCCG 


1080 


GCGGCGCGGA 


CGTCCCGCCC 


GGCTACGCGC 


TCGCCCTGGT 


CGGCGCGGAA 


CCGCGCCCGG 


1140 


CCCGGCTGCG 


CACCGCACTG 


ATCATCGCCC 


GCGGCTACGG 


GGGCTTCAAC 


GCCGCCCTGG 


1200 


TGCTGCGCGG 


CCCGAACACC 


TGA 








1223 
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(2) INFORKATION FOR SEQ ZD NO: 4: 

(i) SEQUCNCE CHARACTERISTICS: 

(A) LENGTH: 264 base pairs 

(B) TYPE: nucleic acid 

(C) STRAKDEONESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
(iii) HYPOTHETICAL: NO 
(iv) ANTI -SENSE: NO 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

ATGGCAACCC GCGAACGCAC CATCGACGAC CTCCGC6CCC TGATGCGCGC CGCOGTCGGC 60 

GAGGCCGACG ACATCGACCT GGACGGC6AC ATCCTCGACT CCACCTTCAC CGAGCTGCAG 120 

TACGACTCGC TCGCCGTGCT CGAGCTCCCG GCCCGCATCC AGACGCAGTG GGGCGTGCTG 180 

ATCCCC6AGG ACGACGCGTC CGGGCTGGAG ACCCCCCCCA TGTTCCTCGA CTACGTCAAC 240 

GCGCGGGCCG TCGCCGAGCG ATGA 2^4 
(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 462 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
(iii) HYPOTHETICAL: NO 
(iv) ANTI -SENSE: NO 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
ATGACGCAGT GGCGCACCGA CAGCGTGATC GTGATCGACG CGCCGCTCGA CGTCGTCTGG 
GACATGACCA ACGACGTCGC CTCCTGGCCG GAGCTGTTCG ACGAGTACGC CTCGGCCGAG 



60 
120 
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ATCCTGGAGC GCGACGGCCA CACCGTCCGC TTCCGGCTGA CGATGCACCC CGACGCCGXC 180 

GGCAACGCCT GGTCGTGGGT GTCGGAGCGC ACGCCCGACC GCGCCGCCCT CACCGTCAAC 240 

606CACC6CG TGGAGACCGG CTGGTTCGAG CACATGAACC TGCGCTGCGA CTACCGCGAG 300 

GTCCCCG6CC GCGTGGAGAT GCGCTCGCGG CAGGACTTCG CGATCAACGA GGCGTCGCCG 360 

GTCTCGCTGG C6GCGATGAC CGAGCGCATC CAGAGCAACT CCCCCGTCCA GATGAAGCTG 420 

ATCAAGGACA AGGTGGAGCG GGCGGCCCGG GGCGCGCGGT GA 462 
(2) INFORMATION FOR SEQ ID N0:6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 462 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) T0P01XX3Y: linear 

(ii) MOLECULE TYPE: cDNA 
(iii) HYPOTHETICAL: HO 
(iv) ANTI-SENSE: NO 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

GTGATCCAGT TCCTGCTCCC GGTCGCGCTC CTCGGCAACG GGTTGTGCCC GGGCGTCCTG 60 

ACGGGCAGCG TCCTCGGCGT CGTGCCGTAC TACCGGACGC TGCCCGAGGA CCGCTACATC 120 

GCCGCGCACG CCTTCGCGGT CGGCCCCTAC GACCCGTTCC AGCCGGTGTG CCTGCTGGTC 180 

f 

AC6GTGGCGG CCGACGCGGT CGCGGCGGCG GTCGCGCCGA CCGCCGCCGC CCGGGTGCTC . 240 

T6CGCGCTCG CCGCCGTGCT CGCGCTGGCG GTGGTGGCGA TCTCGCTCAC CCGCAACGTG 300 

CCGAT6AACC GCCGGATCAA GCG6CTCGAC CCGGCCGCGC CGCCCGCCGG GTTCAGCGCG 360 

CCC6CGTTCC TGCGCCGCTG CGCGGGCTCG AACGCGGCGC GCACCGGCCT GACGCT6GCC 420 

GCCCTGCTCA GCAACACG6C CGCCCTCGGC GTGCTGCTGT GA 462 
(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 1026 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDKE5S: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
(iii) HYPOTHETICAL: NO 
(iv) ANTI -SENSE: NO 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 



ATGACCGAAC 


CGGAAGGACC 


GCACGCCGCG 


AGCCTGCGGC 


TCCAATCTCT 


GCTGGACGGC 


60 


ATGCGCGTCG 


CCAAGGTCGT 


GCAGGTGCTC 


GCCGAACTCC 


AGGTGGCCGA 


CGCGGTCGCC 


120 


GACGGCCCCT 


GCAAGCCCGC 


CGAGATCGCC 


GCCGACGTCG 


GCGCCGACCC 


CGACGCGCTG 


160 


TACCGGGTGC 


TGCGCTGCGC 


CGCCTCGTTC 


GGGGTGTTCA 


CCGAGGACGA 


GGACGGCCGG 


240 


TTCGGGCTCA 


CCCCGATGGC 


CGCGCTGCTG 


CGCACCGGCA 


CCGACGACAG 


CCACCGCGAC 


300 


CTGTTCATGA 


TGGCGGCGGG 


CGACCTGTGG 


TGGCGGCCGT 


ACGGCGAGCT 


GCTGGAGACG 


360 


GTGCGGACCG 


GCCGCCCCGC 


CGCCGAGCTG 


GCGTTCGGGA 


TGCCGTTCTA 


CGACTACCTC 


420 


GGCACCGACC 


CGGCCGCCGC 


CGGGCTCTTC 


GACCGCGCGA 


TGACGCAGGT 


CAGCAAGGGC 


480 


CAGGCGAAGG 


CGATCCTCGG 


CCGCTGCTCG 


TTCGAGCGGT 


ACGCGCGGAT 


CGCCGACGTG 


540 


6GCG6CGGCC 


ACGGCTACTT 


CCTCGCGCAG 


GTGTTGCGCA 


GCAGCCCGCG 


CACCGAGGGC 


600 


GTGCTGCTGG 


ACCTGCCGCA 


CGTGGTGGCC 


GGAGCCCCGG 


CGGTGCTGGA 


GAAGCACGA6 


660 


6TCGCCGACC 


GCGTCCAGGT 


CGTCCCGGGC 


AGCTTCTTC6 


ACGCGCTGCC 


CACCGGCTGC 


720 


GACGCCTACC 


TGCTGAAAGC 


GATCCTCATC 


AACTGGCCCG 


ACGCCGACGC 


CGAACGCATC 


780 


CTGCACCGGG 


TGCGCGAGGC 


GATCGGCACC 


GACCGCGACG 


CGCGGCTGCT 


GGTGGTCGAG 


840 


CCCGTCGTCC 


CGCCCGGCGA 


CGTCCGCGAC 


TACAGCAAGG 


CCACCGACAT 


CGACATGCTC 


900 


GCCATCATCG 


GCGGGCGGCA 


GCGCACCGTC 


GCCGAGTGGC 


GGCGGCTGCT 


GCGCGCGGGC 


960 


GGCTTCGAGC 


TGGTGGGCGA 


GCCCACGCCG 


GGCCGCCCCG 


AGGTCATGGA 


GTGCCGCCCC 


1020 


ATCTGA 












1026 
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(2) INFORMATIOK FOR SEQ ID NO: 8: 

(i) SEQUEKCE CHARACTERISTICS: 

(A) LENGTH: 741 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: doilble 

(D) TOPOZX)Gy: linear 

(il) MOLECOIE TYPE: cONA 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 



(Xi) SEQtJEMCE DESCRIPTION: SEQ ID NO: 8: 



ATGACCGACA 


CATCGTTCGC 


CGGCAAGAAC 


GCGCTGATCA 


CCGGCGGCAC 


CCGGGGCATC 


60 


GGCCGGGCCG 


TCGCGCTCGG 


CCTGGCCCGC 


6CCGGGGCCA ATGTCACCGT 


CTGCTACCGC 


120 


AGCGACGCCG 


AGTCCGCCGC 


CGC6ATGGAA 


GCCGAGCTGG 


CCGCCACCGA 


CGGCAAGCAC 


180 


CACGTGCTCC 


AGGCCGACAT 


CGGCAACGCC 


GGGGACGTCC 


GCCCCCTGCT 


6GACGAGGTC 


240 


GCCGCCCGCA 


TGGGCTCGCT 


CGACGTAGTC 


GTGCACAACG 


CCGGCCTGAT 


CAGCCACGTG 


300 


CCGTTCGCCG 


ACCTGGAGCC 


CGA6GAGTGG 


CACCGGATCG 


TCGACTCCAA 


CCTGACCGGC 


360 


ATGTACCTGG 


TGGTGCGGGC 


CGCGCTGCCG 


CTGCTGTCGG 


AGGGCGGCGC 


GGTCGTCGGC 


420 


GTCGGCTCCA 


AGGTCGCGCT 


CGTCGGCATC 


TCGCAGCGCA 


CCCACTACAC 


CGCCGCCAAG 


480 


GCCG6GCTCA 


TCG6GTTCGT 


GCGCTCGCTC 


AGCAAGGAGC TGGGGCCGCT 


CGGCATCCGG 


540 


6TCAACCTGG 


TCGCGCCCGG 


CATCACCGAG 


ACCGACCAGG 


CCGCGCACCT 


GCCCCCCGTG 


600 


CAGCGCGAGC 


GCTACCAGAG 


CAT6ACCGCG 


CTCAAGCGGC TCCGCCAGGC 


CGACGAGGTC 


660 


GCCGACGTGG 


TGCTCTTCCT 


CGCCGGTCCC 


GGCGCGCGCT 


ACGTCACCGG 


CGAGACCGTC 


720 


AACGTGGACG 


GGGGGAT6TG 


A 








741 


(2) INFORKATION FOR SEQ ID N0t9: 











(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 342 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDRESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: CDNA 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 



(Xi) SEQUENCE DESCRIPTION; SEQ ID N0;9: 

CTGACCATGG CCGACAGCGG CCCGGTGTTC CGGGTGATGC TCCGGATGGA GATCGTCCCC 60 

GGCAGGGACG CGCAGTTCGA GCGGGTCTGG TACTCGGTCC GCGACACCGT CAGCGGCAAC 120 

CCCCCCAACC TC6GCCAGTG CGTGCTGCGC ACCGACGACG AGGAGAGCGT CTACTACATC 180 

ATGACCGACT GCATCGACGA CCCGCGGTTC CGCGAGTTCG AGCGCAGCGA CGGCCACGTC 240 

GACCACCGCC GCAAGCTCCA CCCCTACCGG GT6AAGGGCA GCATGGCGAC GAT6AAGGTC 300 

GTCCACGACC TCGGCCGCGC GGCGGCCGAG CCGGTCC6GT GA 342 
(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 312 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
(iii) HYPOTHETICAL: NO 
(iv) ANTI -SENSE: NO 

(Xi) SEQUENCE DESCRIPTION: SEQ ID N0:10: 
GTGACGCCCG GGCAGGTGCG GGTCCTGGTC CGCTACCAGG CTCCGGGCGA CGACCCCGAG 60 
GCCGTCGTCC AGGCGTACAA GCTGCTCTGC GAGGAACTCC GCGGGACGCC CGGCCTGCTC 120 
GGCAGCGAGC TGCTGGCGTC CACGCTCGAC GAGGGACGGT TCGCGGTGCT GAGCCTGTGG 180 
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ACCGACGCCG CGCGGTTCCA GGAATGGGAG CAGGGCCCGG CGCACAACGG CCACACGTCC 
GGCCTCCGCC CGTTCCGGGA CACCTCCTCG GGGCGCGGCT TCGATTTCTA CGAAGTGGTC 
CACGCCCTGT AA 

(2) INFORMATION FOR S£Q ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1236 base pairs 

(B) TYPE: nucleic acid 

(C) STRANOEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(JCi) SEQUENCE DESCRIPTION: SEQ ID N0:11: 
ATCCCCTCCT CGAAGGATGC CCCGACCGTC GACCCCCGCC CCGACGTCAC GCCGGCCTTC 
COGTTCCCGC CGGACGACCC CTTCCAGCCG CCGTGCGAGC ACGCGCGCCT GCGCGCGTCC 
GACCCGGTCG CCAAGGTGGT GCTGCCGACC GGCGACCACG CGTGCGTCGT GACGCGCTAC 
GCCGACGTCC GGTTCGTCAC CTCGGACCGG CGCTTCAGCA AG6AGGCGGT GACCCGGCCG 
GGCCCCCCGC GCCTGATCCC GATGCAGCGC GGCTCCAAGT CGCTGGTCAT CATGGACCCG 
CCCGACCACA CGAGGATGCG CAAGATCGTG TCTCGCGCGT TCACCGCCCG TCGTGTGGAC 
CGCATGCGCG CGCACGTGCG CGACCTGACG TCGGGGTTOG TGGACGAGAT GGTCGAGCAC 
GGCCCGCCCG CCGACCTGAT CGC6CACCTG GCGCTGCCGC TGCCGGTCAC CGTCATCTGC 
GAGATGCTGG GCCTTCCGCC G6AGGATCGG CC6CGCTTCC AGGACTCGAC CGACCGGAT6 
CTCACCATCG GCGCGCCCGC TCTCCCGCAG GCCGACGAGA TCAAGGCCGC GGTCGGGCGG 
CTGCGCGGCT ACCTCGCCGA GCTGATCGAC GCCAAGACGG CCGCGCCCGC CGACGACCTG 
CTGTCGTTGC TGAGCCGCGC GCACGCCGAC GACGGCCTCA GCGAGGAGGA ACTGCTCACC 
TTCGGCATGA CGCTGCTGGC GGCGGGTTAC CACACCACCA CGGCGGCGAT CACGCACTCG 



240 

300 
312 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
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GTCTACCACC TGCTGCGCGA GCCGTCGCGG TACGCGC6GC TGCGCGAGGA CCCGTCGGCT 
ATCCCGGCGG CCGTTGAGGA ACTGCTACGG TACGGGCAGA TCGGCGGCGG CGCGGGCGCG 
ATCCGCATCG CGGTCGAGGA TGTGGAACTC GGCGGCACCC TCGTGCGCGC GGGCGAGGCG 
GTCATCCCGC TTTTCAACGC CGCCAACCGC GATCCGGAGG TGTTCGCCGA TCCCGAGGAA 
CTCGACCTCG GCCGTACCGA CAACCCGCAC ATCGCGCTCC GCCACGGCAT CCACTACT6C 1080 
CTCCGC6CGC CGCTCGCCCC GCTGGACCTT CAGGTCGTCC TGGACACCCT 6CTGCA6CCG 1140 
ACCCCCGCGC TGCGTCTCGC TATCGACGAT GCCGACATCA CCTGGCGGCC CGGCTTGGCG 
TTCGCCCCGC CGGACGCGCT GCCCATCGCC TGGTAG 
(2) IMFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 347 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNE5S: doxible 
(0) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDMA 

(lii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 



840 

900 
960 
1020 



1200 
1236 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

ATGGACAGGT TCCTGATCGT CGCCCGCATG TCCCCCTCGT CGGAGAAGGA GGTGGCGCGC 60 

CTCTTCCCCG AGTCCGAACG AGGGCACCGA GCTGCCGGAG GTGGCCGGGA CGGTCAGCCG 120 

CAGCCTGCTG TCGTTCCACG CCCTCTACTT CCACCTGACG GACGTGGAGG AGAGCACGGA 180 

CAGGACGCTG AACGGCATCC ACGAACACCC CGAGTTCGTC CGGCTGAGCC GCCAGCTGTC 240 

C6GTCACGTC CAGGCGTACG AACCCGAAGA CGTGGCGCTC GCCCGCCGAC GCCATGGCCC 300 

GC6AGTTCTA CCGGTGGGAG 6CGGGGACCG GCGTCGTGCG CCGCTGA 347 
(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 425 amino acids 



wo 98/11230 



•44- 



PCTAJS96/14791 



(B) TYPE: amino acid 

(C) STRANDSDNESS: not relevant 
(0) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: HO 

(iv) AHTI-SEH5E: HO 



(Xi) SEQUEHCE DESCRIPTlOH: SEQ ID HO: 13: 

Met Ser Arg Pro Gin Gly Gly Gly Pro Arg Arg Val Ala He Thr Gly 
^5 10 15 

Met Gly val Val Ala Pro Gly Gly Ser Gly Arg Lys Ala Phe Trp Asn 
20 25 30 

Leu Leu Thr Asp Gly Arg Thr Ala Thr Arg Lys He Ser Leu Phe Asp 
35 40 45 

Pro Ala Gly Phe Arg Ser Arg He Ala Ala Glu cys Asp Phe Asp Pro 
50 55 60 

Ala Ala Glu Gly Leu Thr Pro Arg Glu Val Arg Arg Met Asp Arg Ala 
« 70 75 80 

Ala Gin Leu Ala Val Val Ser Ala Arg Glu Ala Leu Ala Asp Ser Gly 
85 90 95 

Leu Val Ala Gly Glu Gly Asp Pro Ala Arg Phe Ala Val Ser Leu Gly 
100 105 110 

Ser Ala Val Gly Cys Thr Met Gly Leu Glu Asp Glu Tyr Val Val Val 
115 120 125 

Ser Asp Gin Gly Arg Asp Trp Leu Val Asp His Ser Tyr Gly Val Pro 
130 135 140 

His Leu Tyr Arg His Leu Val Pro Ser Ser Leu Ala Ala Glu Val Ala 

150 155 160 

Trp Ala Gly Gly Ala Glu Gly Pro Val Thr Leu He Ser Thr Gly Cys 
165 170 175 

Thr ser Gly Leu Asp Ala Val Gly His Gly Ala Arg Val He Ala Glu 
180 185 190 
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Gly ser Ala Asp Val Ala Leu Ala Gly Ala Thr Asp Ala Pro He Ser 
195 200 205 

Pro lie Thr Val Ala Cys Phe Asp Ala He Arg Ala Thr Ser Pro Asn 
210 215 220 

Asn Asp Asp Pro Glu His Ala Ser Arg Pro Phe Asp Arg Glu Arg Asn 
225 230 235 240 

Gly Phe Val Leu Gly Glu Gly Ala Ala Val Phe Val Leu Glu Glu Leu 
245 250 255 

Glu His Ala Arg Arg Arg Gly Ala His Val Tyr Cys Glu Val Ala Gly 
260 265 270 

Tyr Ala Thr Arg Gly Asn Ala Tyr His Met Thr Gly Leu Lys Pro Asp 
275 280 285 

Gly Arg Glu Met Ala Glu Ala He Arg Val Ala Met Asp Ala Ala Arg 
290 295 300 

Val Ala Pro Ala Asp Leu Asp Tyr He Asn Ala His Gly Ser Gly Thr 

310 315 320 

Lys Gin Asn Asp Arg His Glu Thr Ala Ala Phe Lys Arg Ser Leu Gly 
325 330 335 

Glu Arg Ala Tyr Glu Leu Pro Val Ser Ser He Lys Ser Met Val Gly 
340 345 350 

His Ser Leu Gly Ala He Gly Ser He Glu Leu Ala Ala Cys Ala Leu 
355 360 365 

Ala He Glu His Gly Val Val Pro Pro Thr Ala Asn Leu His Asn Ala 
370 375 380 

Asp Pro Glu cys Asp Leu Asp Tyr Val Pro Leu Val Ala Arg Glu Gly 
385 390 395 ^qq 

Arg He Arg Thr Val Leu Ser Val Gly Ser Gly Phe Gly Gly Phe Gin 
405 410 415 

Ser Ala Thr Val Leu Arg Glu Ala Ala 
420 425 

(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 407 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: not relevant 
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(D) TOPOLOGY: not 'relevant 
(li) KOLECDLE Tyi>B: peptide 
(111) HYPOTHETICAL: KO 
(iv) ANTI-SENSE: NO 



(Xl) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

Met Ser Val Leu Thr Ala Asp Ala Pro Ala Val Thr Gly He Gly Val 
1 5 10 



15 



Val Ala Pro Thr Gly He Gly Val Glu Glu His Trp Ala Ala Thr Leu 
20 25 30 

Arg Gly Val Pro Val He Gly Pro Leu Thr Arg Phe Asp Ala Ser Arg 
35 40 45 

Tyr Pro Ser Pro Phe Gly Gly Glu Val Pro Gly Phe Asp Ala Ala Glu 
so 55 eo 

^ Val Pro Gly Arg Leu He Pro Gin Thr Asp His Trp Thr His Leu 
" 70 75 BO 

Ala Leu Ala Ala Thr Asp Leu Ala Leu Ala Asp Ala Gly Val Val Pro 
85 90 95 

Ala Glu Leu Pro Glu Tyr Glu Met Ala Val Val Thr Ala Ser Ser Ser 
100 105 ' 110 

Gly Gly Val Glu Phe Gly Gin Arg Glu He Gin Ala Leu Trp Arg Asp 
115 120 125 

Gly Pro Arg His Val Gly Ala Tyr Gin Ser He Ala Trp Phe Tyr Ala 
13*' 135 140 

Ala Thr Thr Gly Gin He Ser He Arg His Gly Met Arg Gly Pro Cys 

150 155 ' iJo 

Gly Val Val Val Ala Glu Gin Ala Gly Ala Leu Glu Ser Phe Ala Gin 
165 170 175 

Ala Arg Arg Tyr Leu Ala Asp Gly Ala Arg Val Val Val Ser Gly Gly 
180 185 190 

Thr Asp Ala Pro Phe Ser Pro Tyr Gly Leu Thr Cys cin Leu Glv Ser 
195 200 205 
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Gly Arg Leu Ser Thr Cly Ala Asp Pro Ala Arg Ala Tyr Leu Pro Phe 
210 215 220 

Asp Ala Ala Aia Asn Gly Phe Val Pro Gly Glu Gly Gly Ala He Leu 

230 235 240 

He He Glu Gin Ala Ala Thr Ala Gin Asp Arg Ser Tyr Gly Arg He 
245 250 255 

Ala Gly Tyr Ala Ala Thr Phe Asp Pro Pro Pro Gly Ser Gly Arg Pro 
260 265 270 

Pro Thr Leu Glu Arg Ala Val Arg Ala Ala Leu Asp Asp Ala Arg Leu 
275 280 285 

Thr Pro Ala Asp Val Asp Val Val Phe Ala Asp Ala Ala Gly Val Pro 
290 295 300 

Asp Leu Asp Arg Ala Glu Ala Asp Ala He Gly Ala Val Phe Gly Pro 

310 315 320 

Arg Gly Val Pro Val Thr Ala Pro Lys Ser Leu Thr Gly Arg Leu Tyr 
325 330 335 

Ala Gly Gly Pro Ala Leu Asp Ala Ala Thr Ala Leu Leu Ala Met His 
340 345 350 

Asp Ser Val He Pro Pro Thr Ala Gly Gly Ala Asp Val Pro Pro Gly 
355 3eo 365 

Tyr Ala Leu Asp Leu Val Gly Ala Glu Pro Arg Pro Ala Arg Leu Arg 
370 375 330 

Thr Ala Leu He He Ala Arg Gly Tyr Gly Gly Phe Asn Ala Ala Leu 

390 395 400 

Vall^u Arg Gly Pro Asn Thr 
405 

(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 87 aolno acids 

(B) TYPE: anino acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: not relevant 

(ii) MOLECDLE TYPE: peptide 
Uii) HYPOTHETICAL: HO 
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(iv) ANTI-SENSE: HO 



(JCi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

Met Ala Thr Arg Glu Arg Thr He Asp Asp l^u Arg Ala Leu Met Arg 
^ 5 10 15 ^ 

Ala Ala Val Cly Glu Ala Asp Asp He Asp Leu Asp Cly Asp He Leu 
20 25 30 

Asp Ser Thr Phe Thr Glu Leu Glu Tyr Asp Ser Leu Ala Val Leu Glu 
35 40 45 

Leu Ala Ala Arg He Glu Thr Gin Trp Gly Val Leu He Pro Glu Asp 
50 55 60 

Asp Ala Ser Gly Leu Glu Thr Pro Arg Met Phe Leu Asp Tyr Val Asn 
" 70 75 30 

Gly Arg Ala Val Ala Glu Arg 
85 

(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 153 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: peptide 
(iil) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:16: 

Met Thr Gin Trp Arg Thr Asp Ser Val He Val He Asp Ala Pro Leu 
^5 10 15 

Asp Val Val Trp Asp Met Thr Asn Asp Val Ala Ser Trp Pro Glu Leu 
20 25 30 
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Ph« Asp Glu Tyr Ala Ser Ala Glu He Leu Glu Arg Asp 61y Asp Thr 



45 



Val Arg Phe Arg Leu Thr Met His Pro Asp Ala Asp Gly Asn Ala Trp 
.55 «0 

Ser Trp Val Ser Glu Arg Thr Pro Asp Arg Ala Ala Leu Thr Val Asn 

70 75 gQ 

Ala His Arg Val Glu Thr Gly Trp Phe Glu His Met Asn Leu Arg Trp 
85 90 95 

Asp Tyr Arg Glu Val Pro Gly Gly Val Glu Met Arg Trp Arg Gin Asp 

Phe Ala Met Lys Glu Ala Ser Pro Val Ser Leu Ala Ala Met Thr Clu 

120 

Arg lie Gin Ser Asn Ser Pro Val Gin Met Lys Leu He Lys Asp Lys 

xju 135 ~ " 



140 



Val Clu Arg Ala Ala Arg Gly Ala Arg 
145 150 

(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 153 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: peptide 
(iii) HYPOTHETICAL: NO 
(iv) ANTI -SENSE: NO 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 



Met He Glu Phe Leu Leu Pro Val Ala Leu Leu Gly Asn Gly Leu cys 
^ 5 10 15 

Ala Gly Val Leu Thr Gly Ser Val Leu Gly Val Val Pro Tyr Tyr Arg 
20 25 30 

Thr Leu Pro Glu Asp Arg Tyr He Ala Ala His Ala Phe Ala Val Gly 

3S 40 45 
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Arg Tyr Asp Pro Phe Gin Pro Val cys Leu Leu Val Thr Val Ala Ala 
50 55 60 

Asp Ala Val Ala Ala Ala Val Ala Pro Thr Ala Ala Ala Arg Val Leu 
« 70 75 80 

cys Ala Leu Ala Ala Val Leu Ala Leu Ala Val Val Ala lie Ser Leu 
85 90 95 

Wir Arg Asn Val Pro Met Asn Arg Arg He Lys Arg Leu Asp Pro Ala 
100 105 110 

Ala Pro Pro Ala Gly Phe Ser Ala Pro Ala Phe Leu Arg Arg Trp Ala 
115 120 125 

Gly Trp Asn Ala Ala Arg Thr Gly Leu Thr Leu Ala Ala Leu Leu Ser 
130 135 140 

Asn Thr Ala Ala Leu Gly Val Leu Leu 
145 150 

(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 341 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: peptide 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

Met Thr Glu Pro Glu Gly Pro His Ala Ala Ser Leu Arg Leu Gin Ser 
1 5 10 . 15 

Leu Leu Asp Gly Met Arg Val Ala Lys Val Val Gin Val Leu Ala Glu 
20 25 30 

Leu Gin Val Ala Asp Ala Val Ala Asp Gly Pro Cys Lys Pro Ala Glu 
35 40 45 

He Ala Ala Asp Val Gly Ala Asp Pro Asp Ala Leu Tyr Arg Val Leu 
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50 



55 



60 



Arg Cys Ala Ala Ser Phe Gly Val Fhe Thr Glu Asp Glu Asp Gly Ara 
65 70 75 80 

Phe Gly Leu Thr Pro Met Ala Ala Leu Leu Arg Thr Gly Thr Asp Asp 
85 90 95 

Ser His Arg Asp Leu Phe Met Met Ala Ala Gly Asp Leu Trp Trp Arg 
100 105 110 

Pro Tyr Gly Glu Leu Leu Glu Thr Val Arg Thr Gly Arg Pro Ala Ala 
115 120 125 

Glu Leu Ala Phe Gly Met Pro Phe Tyr Asp Tyr Leu Gly Thr Asp Pro 
130 135 140 

Ala Ala Ala Gly Leu Phe Asp Arg Ala Met Thr Gin Val Ser Lys Gly 
145 150 155 160 

Gin Ala Lys Ala lie Leu Gly Arg cys Ser Phe Glu Arg Tyr Ala Arg 
165 170 175 

He Ala Asp Val Gly Gly Gly His Gly Tyr Phe Leu Ala Gin Val Leu 
180 185 190 

Arg Ser Ser Pro Arg Thr Glu Gly Val Leu Leu Asp Leu Pro His Val 
195 200 205 

Val Ala Gly Ala Pro Ala Val Leu Glu Lys His Glu Val Ala Asp Arg 
210 215 220 

Val Gin Val Val Pro Gly Ser Phe Phe Asp Ala Leu Pro Thr Gly Cys 
225 230 235 240 

Asp Ala Tyr Leu Leu Lys Ala He Leu He Asn Trp Pro Asp Ala Asp 
245 250 255 

Ala Glu Arg He Leu His Arg Val Arg Glu Ala He Gly Thr Asp Arg 
260 265 270 

Asp Ala Arg Leu Leu Val Val Glu Pro Val Val Pro Pro Gly Asp Val 
275 280 285 

Arg Asp Tyr Ser Lys Ala Thr Asp He Asp Met Leu Ala He He Gly 
290 295 300 

Gly Arg Gin Arg Thr Val Ala Glu Trp Arg Arg Leu Leu Arg Ala Gly 
305 310 315 320 

Gly Phe Glu Leu Val Gly Glu Pro Thr Pro Gly Arg Arg Glu Val Met 
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330 335 

Glu Cys Arg Pro Zlm 
340 

(2) IHFORMATION FOR SEQ ID H0:X9: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 246 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: peptide 
(iii) HYPOTHETICAL: NO 
(iv) ANTI -SENSE: NO 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

Met Thr Asp Thr Ser Phe Ala Gly Lys Asn Ala Leu lie Thr Gly Gly 
5 10 15 

Thr Arg Gly lie Gly Arg Ala Val Ala Leu Gly Leu Ala Arg Ala Gly 
20 25 30 



Ala Asn Val Thr Val Cys Tyr Arg Ser Asp Ala Glu Ser Ala Ala Ala 

40 

Ala Thr Asp Gly Lys His 
55 60 



35 40 45 

Met Glu Ala Glu Leu Ala Ala Thr Asp Gly Lys His His Val Leu Gin 



Ala Asp lie Gly Asn Ala Gly Asp Val Arg Arg Leu Leu Asp Glu Val 

^® 75 80 

Ala Ala Arg Met Gly Ser Leu Asp Val Val Val His Asn Ala Gly Leu 
85 90 95 

lie Ser His Val Pro Phe Ala Asp Leu Glu Pro Glu Glu Trp His Aro 

100 105 110 

He Val Asp Ser Asn Leu Thr Gly Met Tyr Leu Val Val Arg Ala Ala 

120 125 

Leu Pro Leu Leu Ser Glu Gly Gly Ala Val Val Gly Val Gly Ser Lys 

135 140 
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Val Ala Lau Val Gly He Ser Gin Arg Thr His Tyr Thr Ala Ala Lys 

150 155 ISO 

Ala Gly Leu He Gly Phe Val Arg Ser Leu Ser Lys Glu Leu Gly Pro 
165 170 175 

Leu Gly He Arg Val Asn Leu Val Ala Pro Gly He Thr Glu Thr Aap 
180 185 190 

Gin Ala Ala His Leu Pro Pro Val Gin Arg Glu Arg Tyr Gin Ser Met 
195 200 205 

Thr Ala Leu Lys Arg Leu Gly Gin Ala Asp Glu Val Ala Asp Val Val 
210 215 220 

Leu Phe Leu Ala Gly Pro Gly Ala Arg Tyr Val Thr Gly Glu Thr Val 
2" 230 235 240 

Asn Val Asp Gly Gly Met 
245 

(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 113 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: peptide 
(iii) HYPOTHETICAL: HO 
(iv) ANTI -SENSE: NO 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 

Val Thr Met Ala Asp Ser Gly Pro Val Phe Arg Val Met Leu Arg Met 
^ 5 10 15 

Glu He Val Pro Gly Arg Glu Ala Glu Phe Glu Arg Val Trp Tyr Ser 
20 25 30 

Val Gly Asp Thr Val Ser Gly Asn Pro Ala Asn Leu Gly Gin Cys Vai 
35 40 45 

Leu Arg Ser Asp Asp Glu Glu Ser Val Tyr Tyr He Met Ser Asp Trp 
50 55 60 
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Ile Asp Glu Ala Arg Phe Arg Glu Phe Glu Arg Ser Asp Cly Bis Val 

75 80 

Glu His Arg Arg Lys Leu His Pro Tyr Arg Val Lys Gly Ser Met Ala 
■5 90 95 

Thr Net Lys Val Val His Asp Leu Gly Arg Ala Ala Ala Glu Pro Val 

105 

Arg 

(2) INFORMATION FOR SEQ ID N0:21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 103 amino acids 

(B) TYPE: aaino acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: peptide 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 

Val Thr Ala Gly Gin Val Arg Val Leu Val Arg Tyr Gin Ala Pro Gly 
1 5 10 ' 



15 



Asp Asp Pro Glu Ala Val Val Gin Ala Tyr Lys Leu Val Cys Glu Glu 
20 25 30 

Leu Arg Gly Thr Pro Gly Leu Leu Gly ser Glu Leu Leu Ala Ser Thr 
35 40 45 

Leu Asp Glu Gly Arg Phe Ala Val Leu Ser Leu Trp Ser Asp Ala Ala 
50 55 60 

Arg Phe Gin Glu Trp Glu Gin Gly Pro Ala His Lys Gly Gin Thr Ser 

''O 75 80 

Cly Leu Arg Pro Phe Arg Asp Thr Ser Ser Gly Arg Gly Phe Asp Phe 
85 90 95 

Tyr Glu Val Val His Ala Leu 
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100 

(2) XNFOKKATZON FOR SEQ ZD NO: 22: 

(i) SEQUENCE CHAMCTERZSTZCS: 

(A) X£NGTH: 411 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDKESS : not relevant 

(D) TOPOLOGY: not relevant 

(ii) NOLECDLE TYPE: peptide 
(iii) HYPOTHETICAL: NO 
(iv) ANTI^SENSE: NO 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 

Met Pro Ser Ser Lys Asp Ala Pro Thr Val Asp Pro Arg Pro Asp Val 
1 5 10 15 

Thr Pro Ala Phe Pro Phe Arg Pro Asp Asp Pro Phe Gin Pro Pro Cys 
20 25 30 

Glu His Ala Arg Leu Arg Ala Ser Asp Pro Val Ala Lys Val Val Leu 
35 40 45 

Pro Thr Gly Asp His Ala Trp Val Val Thr Arg Tyr Ala Asp Val Arg 
50 55 60 

Phe Val Thr Ser Asp Arg Arg Phe Ser Lys Glu Ala Val Thr Arg Pro 
C5 70 75 80 

Gly Ala Pro Arg Leu lie Pro Met Gin Arg Gly Ser Lys Ser Leu Val 
65 90 95 

lie Met Asp Pro Pro Glu His Thr Arg Met Arg Lys He Val Ser Arg 
100 105 110 

Ala Phe Thr Ala Arg Arg Val Glu Gly Met Arg Ala His Val Arg Asp 
115 120 125 

Leu Thr Ser Gly Phe Val Asp Glu Met Val Glu His Gly Pro Pro Ala 
130 135 

Asp Leu He Ala His Leu Ala Leu Pro Leu Pro Val Thr Val He Cys 



150 



155 



160 
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Glu M*t Leu Cly Val Pro Pro Glu Asp Arg Pro Arg Phe Gin Asp Trp 
1«5 170 175 

Thr Aap Arg Met Leu Thr lie Gly Ala Pro Ala Leu Ala Gin Ala Aap 
"0 185 190 

Glu He Lys Ala Ala Val Gly Arg Leu Arg Gly Tyr Leu Ala Glu Leu 
*»5 200 205 

He Asp Ala Lys Thr Ala Ala Pro Ala Asp Aap Leu Leu ser Leu Leu 

215 220 

ser Arg Ala His Ala Asp Asp Gly Leu Ser Glu Glu Glu Leu Leu Thr 
225 230 235 240 

Phe Gly Met Thr Leu Leu Ala Ala Gly Tyr His Thr Thr Thr Ala Ala 
345 250 255 

He Thr His ser Val Tyr His Leu Leu Arg Glu Pro Ser Arg Tyr Ala 
260 265 270 

Arg Leu Arg Clu Asp Pro Ser Gly He Pro Ala Ala Val Glu Glu Leu 
275 2B0 285 

Leu Arg Tyr Gly Gin He Gly Gly Gly Ala Gly Ala He Arg He Ala 

2»0 295 300 

Val Glu Asp Val Glu Val Gly Gly Thr Leu Val Arg Ala Gly Glu Ala 

310 315 320 

Val He Pro Leu Phe Asn Ala Ala Asn Arg Asp Pro Glu Val Phe Ala 
325 330 335 

Asp Pro Clu Glu Leu Asp Leu Gly Arg Thr Asp Asn Pro His He Ala 
340 345 

I*u Gly His Gly He His Tyr cys Leu Gly Ala Pro Leu Ala Arg Leu 
355 360 365 

Glu I«u Gin Val Val Leu Glu Thr Leu Val Glu Arg Thr Pro Ala Leu 

370 375 3gQ 

Arg Leu Ala He Asp Asp Ala Asp He Thr Trp Arg Pro Gly Leu Ala 

390 395 400 

Phe Ala Arg Pro Asp Ala Leu Pro He Ala Trp 
405 410 

(2) IHFORKATION FOR SEQ ZD NO:23: 
(i) SEQUENCE CKARACTERISTZCS: 



wo 98/11230 



PCT/US96/14791 



-57- 



(A) LEKGTH: 114 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: peptide 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 



(Xi) SEQUENCE DESCRIPTION: SEQ ID N0:23: 

Met Asp Arg Phe Leu He Val Ala Arg Met Ser Pro Ser Ser Glu Lys 
^5 10 15 

Glu Val Ala Arg Leu Phe Ala Glu Ser Asp Glu Gly Thr Glu Leu Pro 
20 25 30 

Glu Val Ala Gly Thr Val Ser Arg Ser Leu Leu Ser Phe His Gly Leu 
35 40 45 

Tyr Phe His Leu Thr Glu Val Glu Glu Ser Thr Asp Arg oiir Leu Asn 

50 55 60 

Gly He His Glu His Pro Glu Phe Val Arg Leu Ser Arg Gin Leu Ser 
« 70 75 80 

Gly His Val Gin Ala Tyr Asp Pro Lys Thr Trp Arg Ser Pro Ala Asp 
85 90 95 

Ala Met Ala Arg Glu Phe Tyr Arg Trp Glu Ala Gly Thr Gly Val Val 

100 105 110 

Arg Arg 



(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) I£NGTH: 54 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(A) DESCRIPTION: /desc * probe" 
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(ill) BYPOTKETXCAL: NO 
(Iv) AHTI-SEKSS; NO 



(Xl) SSQUEKCE DESCRIPTION: 5BQ ID NO:24: 
GGCGCCCAG6 CCCCGCTCAC GATOGTCTCC ACCCGCTGCA CCTCCGGCCT 
(2) INFORMATION FOR 8EQ ID N0:25: 

(1) SEQUBNCE CHARACTERISTICS: 

(A) LENGTH: 54 base pairs 

(B) TYPE: nucleic acid 

(C) STRAHDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(A) DESCRIPTION: /desc « "probe** 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:25: 
CCCGTCACCT CCATCAAGTC CATGCTCGGC CACTCGCTCG GCCCGATCGG 



54 
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WE CLAIM: 

1 . A substantially pure nucleic acid comprising a nucleic acid 
encoding a polypeptide sharing at least about 75% annino acid identity 
with an Actinomadura polyketide synthase. 

2. The nucleic acid of claim 1, encoding a polypeptide sharing 
at least about 80% amino acid identity with an Actinomadura polyketide 
synthase. 



10 3. The nucleic acid of claim 2, encoding a polypeptide sharing 

at least about 90% amino acid identity with an Actinomadura polyketide 
synthase. 

4. The substantially pure nucleic acid of claim 1 , comprising a 
15 nucleic acid selected from the group consisting of SEQ 10 NO: 1-1 2. 

5. A transformed eukaryotic or prokaryotic cell comprising the 
nucleic acid of claim 1 . 

20 6. A vector capable of reproducing in a eukaryotic or 

prokaryotic cell comprising the nucleic acid of claim 1 . 

7. A substantially pure nucleic acid comprising a nucleic acid 
that hybridizes to the nucleic acid of claim 1 under stringent conditions. 

25 

8. A substantially pure nucleic acid comprising a nucleic acid 
encoding a polypeptide sharing at least about 75% amino acid identity 
with a polyketide synthase for biosynthesis of a 
benzo(s)naphthacenequinone. 



30 



wo 98/11230 




PCT/USW/14791 



9. The substantially pure nucleic acid of claim_8, encoding a 
polypeptide sharing at least about 80% amino acid identity with a 
polyketide synthase for biosynthesis of a benzo(a)naphthacenequinone. 

10. The nucleic acid of claim 9, encoding a polypeptide sharing 
at least about 90% amino acid identity with a polyketide synthase for 
biosynthesis of a benzo(a)naphthacenequinone. 

1 1 . The nucleic acid of claim 1 0, wherein the polyketide 
synthase is an Actinomadura polyketide synthase. 

12. The nucleic acid of claim 11, wherein the polyketide 
synthase is an Actinomadura polyketide synthase. 

13. The nucleic acid of claim 12, wherein the polyketide 
synthase is an Actinomadura polyketide synthase. 

14. The nucleic acid of claim 8, wherein the 
benzo{s)naphthacenequinone is a dihydrobenzo(a)naphthacenequinone 
aglycon. 

1 5. The nucleic acid of claim 9, wherein the 
benzo(£)naphthacenequinone is a dihydrobenzo(d)naphthacenequinone 
aglycon. 

16. The nucleic acid of claim 10, wherein the 
benzo(a)naphthacenequinone is a dlhydrobienzo(a)naphthacenequinone 
aglycon. 

17. The nucleic acid of claim 14, wherein the 
dihydrobenzo(d)naphthacenequinone aglycon is pradimicin. 
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18. The nucleic acid of claim 1 5, wherein the 
dihydrobenzo(a)naphthacenequinoneaglycon is pradimicin. 

65 19. The nucleic acid of claim 16, wherein the 

dihydrobenzo(a)naphthacenequinone aglycon is pradimicin. 

20. A substantially pure polypeptide comprising an amino acid 
sequence sharing at least about 75% amino acid identity with an 

70 Actinomadura polyketide synthase* 

21 . The polypeptide of claim 20, comprising an amino acid 
sequence sharing at least about 80% amino acid identity with an 
Actinomadura polyketide synthase. 

22. The polypeptide of claim 21, comprising ^n amino acid 
sequence sharing at least about 90% amino acid identity with an 
Actinomadura polyketide synthase. 

23. The polypeptide of claim 22, comprising an amino acid 
sequence selected from the group consisting of SEQ ID NO: 13, SEQ ID 

-NO: 1 4 and SEQ ID NO: 1 5. 

24. A method of preparing pradimicin or an analog thereof 
comprising: 

{a) transforming a eukaryotic or prokaryotic cell with an 
expression vector for expressing intracellularly or extracellularly a nucleic 
acid comprising a nucleic acid encoding a polypeptide sharing at least 
about 70% amino acid identity with an Actinomadura polyketide 
synthase; 

(b) growing the transformed cell in culture; and 
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(c) isolating the pradimicin or analog thereof from the 
transformed celt or the culture medium. 

25. The method of claim 24, wherein the polypeptide shares at 
least about 80% amino acid identity with an Actinomadura polyketide 
synthase. 

26. The method of claim 25, wherein the polypeptide shares at 
least about 90% amino acid identity with an Actinomadura polyketide 
synthase. 

27. The method of claim 24, wherein the nucleic acid 
comprises SEQ ID N0:1. 
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FIGURE I 
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p - Keto synthase 

Granatidin GAE6PVTMVSDGCTSGLD 
Tetracenoroycin GAEGPVTVVSTGCTSGLO 

Actinorhodin G A E G P V T MVSTGCTSGLD 

CONSENSUS GAEGPVTMVSTGCTS6LD 
Probe 1 (54 mer) 5'-GGCGCGGAGGGCCCGGTaCGATGGTCrCCACCGGCTGCACCTCGGGCCTGGAC-3' 

Acyl transferase 

Granatidin PVSSIKSyGGHSLGAIGS 
Tetracenoraycin PVSSIKSMI6HSLGAIGS 
Actinorhodin PVSS I KSUVGHSL G A I G S 

CONSENSUS PVSS I KSM.()GHSLGA I GS 

Probe 2 (54 mer) 5'-CCCGTCAGCTCCATCAA6TCCATGGTCGGCCACTCGCTCGGCGCGATCGGCTCC-3* 



FIGURE 2 
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A MSRPQGG6PRRVA I TGMGWAPG6SGRKAFWNLLTDGRTATRK I SLFDPAGFRSR I AAEC 
.**,***,** ****^* ****** ***** 4c ***** * 

B MTRHAEKRW I TG I GVRAPGGAGTAAFWDLLTAGRTATRT I SLFDAAPYRSR I AGE I 

1 57 

DFDPAAEGLTPREVRRUDRAACLAWSAREAUDSGLVAGEGDPARFAVSL6SAVGCTW 
**** .***.**... ***^*****^***** **** * * ** * ***** * 

DFDP I GEaSPRQASnDRATQLAWaumiCDSGLDPJ^^ GVsi GTAVGCHG 

LEDEYVVVSDQGRDWLVDHSYG\ra.YRHLVPSSLAAEVAWAGGAE6P^ I STGCTSGL 
*. ♦*. **. *. **♦**. . ♦ . ♦. ..**.*. **♦*. . ******* ******** 

LDREYARVSEGGSRWLWHkAVECLroWTsicREVAWEAGAEGPV^ 

DAVGHGARV I AEGSAOVAUGATDAP I SP I TVACFDA I RATSPNNDDPEHASRPFDRERN 
**♦*.*. .* .*.***.. ****************** ^***^*****^ ******** * 

DAVGYGTEL I RDGRADVWCGATDAP I SP I TVACFDA I jb^TSMMMP/^RPFDRNRD 

GFVLGEGAAVFVLEELEHARRRGAHVYCEVAGYATRGNAYHMTGLKPDGREMAEA I RVAM 
******* ^rn*******^ ******* * ** * *** ** ***mmi^^:i^,^*#**ti* # 

GFVLGEGSAVFVLEELSAARRRGAHAYAEVRGFATRSlW^HMTGU^ 

DAARVAPADLDYI NAHGSGTKQNDRHETAAFKRSLGERAYELPVSS I KSMVGHSLGAIGS 
** . . **. *********^ *************** ^ m**^ ^ ********^ *******^ 

DQARRTGDOLHY I ^M^IGSGT^O>^DRHETAAFKRSLGQRAYbwVSS I KSMI GHSLGA I GS 
I ELAACALA I EHGWPPTANLHNADPECDLDYVPLVAREGR I RTVLSVGSGFGGFQSATV 

************* iH,**0 ^ ^ ^ 4r)|i*4i*««*4i:|i *^ *************** 4, 

LELAACAU I EHGvi PPTANYEEP0PEa)LJ)YVPNVAREQRWmSV6SGFGGFQSAAV 
425 

LREAA 
♦ 

LARPK 
422 
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1 60 
A MSVLTADAPAVTG 1 6WAPTG 1 6VEEHWAATLR6VPV I GPLTRFDASRYPSPFGGEVPGF 
.**.♦♦•**.♦.*.. .*.*.**. .**.****.****....*. * 
B USVLITGVGWAPN6LGUU>WSAVLDGRHGL(y 

1 54 
DAAERVPGRL IPOTDHIIinHJ\LMTDLAIJU)AGWPAELPEYEM^ 
.*....**♦*.*♦**. ♦.•*♦.•.* ** . ♦..♦..•.*,****_^** ^♦^^« 

HAPDHIPGRUJ»OTDPSTRLALTAADWALQDAIO«)PESLTDYDMGm 

E I QALWRDGPRHVGAYQS I AVFYAATTGQ I S I RHGMRGPC6VWAEQAGALESFA0ARRY 

*. . **. . **. ♦. . *. **♦♦*. . *♦♦**»***♦**♦ iMi4i*4n|i m 

EFRKLWSEGPKSVSVYESFAWFYAVNTGQI S I RHGMRGPSsia.VAEQAGGLDAL6H^ 

LAIXSARVWSGGTDAPFSPYGLTCQLGSGfLSTGADPARAYU'FDAAANa I L 

. *...****♦.*....*.* ..*..♦****..•*♦******. ♦.*.♦**•***♦• 

I RR(mVVSGGVDSALJ)PWGIIIIVSQI ASGRI STATD^ I L 

I lEQAATAQDRS ^YGRIAGYAATFDPPPGSGRPPTLERAVRAALDDARLTPADVDW 

..*..*.*..*. ♦♦♦♦^^ 

\a£OSAAAEARGRHDAYGELAGCASTFDPAPGSGRPAGLEIUIRLALND^ 

FADAAGVPDLDRAEADA I GAW6Pfl6\mAPKSLTGRLYAGGPAU)AATALLAUHDSV I 
0mm^****^** **4> ***^ **♦♦* Ki*^ 00000 00 00 000 00 

FAD6A6VPELDAAEARA I GRVFGRE6VP\aWldTTGRLYSGGGki)mAIJ^^ I 

407 

PPTAGGADVPPGYALDLVGAEPflPARLRTALI lARGYGGFNAALVLRGPNT 
.**** ..♦*..*..*** .♦**... ****..**♦ *♦♦. ♦ 

APTAG\nrSVPREYGI0LVL6EPRSTAPRTALVUWR«IIGFNSAA\fl^ 

405 
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